Skip to main content

Estnltk — open source tools for Estonian natural language processing

Project description

EstNLTK -- Open source tools for Estonian natural language processing

EstNLTK provides common natural language processing functionality such as paragraph, sentence and word tokenization, morphological analysis, named entity recognition, etc. for the Estonian language.

The project is funded by EKT (Eesti Keeletehnoloogia Riiklik Programm).

Currently, there are two branches of EstNLTK:

  • version 1.6 -- the new branch, which is in a beta status and under development. The version 1.6.9beta is available from Anaconda package repository. More recently, PyPI wheels have also been created and made available under the version 1.6.9.1beta. Due to the beta status, some of the tools are limited or incomplete. Supported Python versions are 3.6, 3.7 and 3.8. The source of the latest release is available at the branch version_1.6, and the development source can be found at devel_1.6.

  • version 1.4.1 -- the old branch, which contains full functionality of different analysis tools. Available via Anaconda package repository for Python 3.5. PyPI packages are also available for Python 3.4, 3.5 and 2.7. Python versions 3.6, 3.7 and beyond are not supported;

Version 1.6

Installation

The recommended way of installing EstNLTK is by using the anaconda python distribution and python 3.6+.

Installable packages have been built for osx, windows-64, and linux-64.

Installation steps with conda:

  1. create a conda environment with python 3.8, for instance:
conda create -n py38 python=3.8
  1. activate the environment, for instance:
conda activate py38
  1. install EstNLTK with the command:
conda install -c estnltk -c conda-forge estnltk=1.6.9b

Alternatively, you can install EstNLTK via PyPI wheel.
Wheels are available for windows-64, linux-64 and osx_64, covering Python versions 3.6 - 3.9. The corresponding version is 1.6.9.1beta, and it can be installed with command:

pip install estnltk==1.6.9.1b0

Note: The version 1.6.9b0 (conda) and 1.6.9.1b0 (PyPI) are equal considering the main functionalities. While the version 1.6.9b0 (and earlier EstNLTK's versions) are also available in PyPI, the support for different platforms and Python versions is very limited in earlier PyPI releases. If you need to use earlier versions of EstNLTK, please use our Anaconda packages.

Note: for using some of the tools in estnltk, you also need to have Java installed in your system. We recommend using Oracle Java http://www.oracle.com/technetwork/java/javase/downloads/index.html, although alternatives such as OpenJDK (http://openjdk.java.net/) should also work.

Using on Google Colab

You can install EstNLTK on Google Colab environment via command:

!pip install estnltk==1.6.9.1b0

Neural models

Neural models of EstNLTK are not distributed with the package, but must be downloaded separately from the repository https://entu.keeleressursid.ee/entity/folder/7510. Neural models for syntactic parsing can be downloaded from https://entu.keeleressursid.ee/entity/folder/9785 (see also the tutorial of syntactic parsers).

Documentation

Documentation for 1.6 currently comes in the form of jupyter notebooks, which are available here: https://github.com/estnltk/estnltk/tree/version_1.6/tutorials

Additional educational materials on EstNLTK version 1.6 are available on web pages of an NLP course taught at the University of Tartu:

Note: if you have trouble viewing jupyter notebooks in github (you get an error message Sorry, something went wrong. Reload? at loading a notebook), then try to open notebooks with the help of https://nbviewer.jupyter.org

Source

The source of the latest release is available at the branch version_1.6, and the development source can be found at devel_1.6.

Version 1.4.1

Installation

The recommended way of installing estnltk v1.4 is by using the anaconda python distribution and python 3.5.

We have installable packages built for osx, windows-64, and linux-64. Installation steps:

  1. create a conda environment with python 3.5, for instance:
conda create -n py35 python=3.5
  1. activate the environment, for instance:
conda activate py35
  1. install estnltk with the command:
conda install -c estnltk -c conda-forge nltk=3.4.4 estnltk=1.4.1

Note: for using some of the tools in estnltk, you also need to have Java installed in your system. We recommend using Oracle Java http://www.oracle.com/technetwork/java/javase/downloads/index.html, although alternatives such as OpenJDK (http://openjdk.java.net/) should also work.

If you have jupyter notebook installed, you can use EstNLTK in an interactive web application. For that, type the command:

jupyter notebook

To run our tutorials, download them as a zip file, unpack them to a directory and run the command jupyter notebook in that directory.


The alternative way for installing if you are unable to use the anaconda distribution is:

python -m pip install estnltk==1.4.1.1

This is slower, more error-prone and requires you to have the appropriate compilers for building the scientific computation packages for your platform.

Find more details in the installation tutorial for version 1.4.

Documentation

Release 1.4.1 documentation is available at https://estnltk.github.io/estnltk/1.4.1/index.html. For previous versions refer to https://estnltk.github.io/estnltk. For more tools see https://estnltk.github.io.

Additional educational materials on EstNLTK version 1.4 are available on web pages of the NLP courses taught at the University of Tartu:

Source

The source of the latest v1.4 release is available at the master branch.

Citation

In case you use EstNLTK 1.6 in your work, please cite us as follows:

@InProceedings{laur-EtAl:2020:LREC,
  author    = {Laur, Sven  and  Orasmaa, Siim  and  Särg, Dage  and  Tammo, Paul},
  title     = {EstNLTK 1.6: Remastered Estonian NLP Pipeline},
  booktitle = {Proceedings of The 12th Language Resources and Evaluation Conference},
  month     = {May},
  year      = {2020},
  address   = {Marseille, France},
  publisher = {European Language Resources Association},
  pages     = {7154--7162},
  url       = {https://www.aclweb.org/anthology/2020.lrec-1.884}
}

If you use EstNLTK 1.4.1 (or older), please cite:

@InProceedings{ORASMAA16.332,
author = {Siim Orasmaa and Timo Petmanson and Alexander Tkachenko and Sven Laur and Heiki-Jaan Kaalep},
title = {EstNLTK - NLP Toolkit for Estonian},
booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)},
year = {2016},
month = {may},
date = {23-28},
location = {Portorož, Slovenia},
editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis},
publisher = {European Language Resources Association (ELRA)},
address = {Paris, France},
isbn = {978-2-9517408-9-1},
language = {english}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

estnltk-1.6.9.1b0.tar.gz (71.5 MB view hashes)

Uploaded Source

Built Distributions

estnltk-1.6.9.1b0-cp39-cp39-win_amd64.whl (72.4 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

estnltk-1.6.9.1b0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (83.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

estnltk-1.6.9.1b0-cp39-cp39-macosx_10_9_x86_64.whl (72.4 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

estnltk-1.6.9.1b0-cp38-cp38-win_amd64.whl (72.4 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

estnltk-1.6.9.1b0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (83.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

estnltk-1.6.9.1b0-cp38-cp38-macosx_10_9_x86_64.whl (72.4 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

estnltk-1.6.9.1b0-cp37-cp37m-win_amd64.whl (72.4 MB view hashes)

Uploaded CPython 3.7m Windows x86-64

estnltk-1.6.9.1b0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (83.1 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

estnltk-1.6.9.1b0-cp37-cp37m-macosx_10_9_x86_64.whl (72.4 MB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

estnltk-1.6.9.1b0-cp36-cp36m-win_amd64.whl (72.4 MB view hashes)

Uploaded CPython 3.6m Windows x86-64

estnltk-1.6.9.1b0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (83.1 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64 manylinux: glibc 2.5+ x86-64

estnltk-1.6.9.1b0-cp36-cp36m-macosx_10_9_x86_64.whl (72.4 MB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page