unitexlemmatizer

A simple lemmatizer based on Unitex word lists

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 2.7
- Python :: 3
Topic
- Software Development :: Build Tools

Project description

This is a simple module for lemmatization based on the Unitex inflected word list. As such, it needs a Unitex vocabulary file in order to work properly.

So far, I’ve only worked with Portuguese, with the DELAF_PB file provided by NILC.

Installing

You can either clone the repository and install with

$ python setup.py install

or install through pip

$ pip install unitexlemmatizer

Usage

In order to use the Unitex Lemmatizer, you need to tell it where the word list is:

>>> import unitexlemmatizer as ul
>>> ul.load_unitex_dictionary('/path/to/delaf.dic')

Then, you can call the get_lemma function passing the inflected word and its part of speech tag (from the Universal Dependencies tagset).

>>> ul.get_lemma('corpora', 'noun')
'corpus'

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 2.7
- Python :: 3
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

1.0.0

Jan 19, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unitexlemmatizer-1.0.0.tar.gz (3.0 kB view hashes)

Uploaded Jan 19, 2017 Source

Built Distributions

unitexlemmatizer-1.0.0-py2.py3-none-any.whl (4.9 kB view hashes)

Uploaded Jan 19, 2017 Python 2 Python 3

unitexlemmatizer-1.0.0-py2.7.egg (5.1 kB view hashes)

Uploaded Jan 19, 2017 Source

Hashes for unitexlemmatizer-1.0.0.tar.gz

Hashes for unitexlemmatizer-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`6602ab1bdd8fd0946f6348718a6f6473814f81e8f77144e647dfee3645ff62a5`
MD5	`e2b5ef3622bf8939bf6a9a39ce385bb4`
BLAKE2b-256	`f57b61b0192d541ccb055603d75bf52021b9399a5cf4a2ef22f0b09a34bbc208`

Hashes for unitexlemmatizer-1.0.0-py2.py3-none-any.whl

Hashes for unitexlemmatizer-1.0.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`a493635169a21456d66e7587a065ec86e0fb80b926516198077335a60fd38df3`
MD5	`36a8f4d39f2d0b494158320ecb3faf6e`
BLAKE2b-256	`1ed639ad1bd2dce9bd0d90faa64373a2fee48e89fe03e95da9b7a04cded0339b`

Hashes for unitexlemmatizer-1.0.0-py2.7.egg

Hashes for unitexlemmatizer-1.0.0-py2.7.egg
Algorithm	Hash digest
SHA256	`5a7a4699e10a1b37efaac2e9404e8766c0e664c907bc1a89fcd37910756dac08`
MD5	`4811bf793feb638b997305efa3654171`
BLAKE2b-256	`f9aebee3a227b4c623abd36c2354909354783a0a413e9bc11c5421c00b1ae1e9`