PyStemmer 1.0.1
Snowball stemming algorithms, for information retrieval
Latest Version: 1.2.0
Stemming algorithms
PyStemmer provides access to efficient algorithms for calculating a "stemmed" form of a word. This is a form with most of the common morphological endings removed; hopefully representing a common linguistic base form. This is most useful in building search engines and information retrieval software; for example, a search with stemming enabled should be able to find a document containing "cycling" given the query "cycles".
PyStemmer provides algorithms for several (mainly european) languages, by wrapping the libstemmer library from the Snowball project in a Python module.
It also provides access to the classic Porter stemming algorithm for english: although this has been superceded by an improved algorithm, the original algorithm may be of interest to information retrieval researchers wishing to reproduce results of earlier experiments.
- Author: Richard Boulton
- Home Page: http://snowball.tartarus.org/
- Download URL: http://snowball.tartarus.org/wrappers/PyStemmer-1.0.1.tar.gz
- Keywords: python,information retrieval,language processing,morphological analysis,stemming algorithms,stemmers
- License: MIT,BSD
- Platform: any
-
Categories
- Development Status :: 5 - Production/Stable
- Intended Audience :: Developers
- License :: OSI Approved :: BSD License
- License :: OSI Approved :: MIT License
- Natural Language :: Danish
- Natural Language :: Dutch
- Natural Language :: English
- Natural Language :: Finnish
- Natural Language :: French
- Natural Language :: German
- Natural Language :: Italian
- Natural Language :: Norwegian
- Natural Language :: Portuguese
- Natural Language :: Russian
- Natural Language :: Spanish
- Natural Language :: Swedish
- Operating System :: OS Independent
- Programming Language :: C
- Programming Language :: Other
- Programming Language :: Python
- Topic :: Database
- Topic :: Internet :: WWW/HTTP :: Indexing/Search
- Topic :: Text Processing :: Indexing
- Topic :: Text Processing :: Linguistic
- Package Index Owner: richardb
- DOAP record: PyStemmer-1.0.1.xml
