skip to navigation
skip to content

unitexlemmatizer 1.0.0

A simple lemmatizer based on Unitex word lists

This is a simple module for lemmatization based on the Unitex inflected word list. As such, it needs a Unitex vocabulary file in order to work properly.

So far, I’ve only worked with Portuguese, with the DELAF_PB file provided by NILC.

Installing

You can either clone the repository and install with

$ python setup.py install

or install through pip

$ pip install unitexlemmatizer

Usage

In order to use the Unitex Lemmatizer, you need to tell it where the word list is:

>>> import unitexlemmatizer as ul
>>> ul.load_unitex_dictionary('/path/to/delaf.dic')

Then, you can call the get_lemma function passing the inflected word and its part of speech tag (from the Universal Dependencies tagset).

>>> ul.get_lemma('corpora', 'noun')
'corpus'
 
File Type Py Version Uploaded on Size
unitexlemmatizer-1.0.0-py2.7.egg (md5) Python Egg 2.7 2017-01-19 4KB
unitexlemmatizer-1.0.0-py2.py3-none-any.whl (md5) Python Wheel py2.py3 2017-01-19 4KB
unitexlemmatizer-1.0.0.tar.gz (md5) Source 2017-01-19 2KB