Paquete para PLN de lenguas originarias

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
License
- OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Natural Language
- Spanish
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence
- Utilities

Project description

Py-Elotl

Python package for Natural Language Processing (NLP), focused on low-resource languages spoken in Mexico.

This is a project of Comunidad Elotl.

Developed by:

Paul Aguilar @penserbjorne, paul.aguilar.enriquez@hotmail.com
Robert Pugh @Lguyogiro, robertpugh408@gmail.com

Requiere python>=3.X

Development Status Pre-Alpha. Read Classifiers
pip package: elotl
GitHub repository: ElotlMX/py-elotl

Installation

Using `pip`

pip install elotl

From source

git clone https://github.com/ElotlMX/py-elotl.git
cd py-elotl
pip install -e .

Use

Working with corpus

import elotl.corpus

Listing available corpus

Code:

print("Name\t\tDescription")
list_of_corpus = elotl.corpus.list_of_corpus()
for row in list_of_corpus:
    print(row)

Output:

Name		Description
['axolotl', 'Is a Spanish-Nahuatl parallel corpus']
['tsunkua', 'Is a Spanish-otomí parallel corpus']

Loading a corpus

If a non-existent corpus is requested, a value of 0 is returned.

axolotl = elotl.corpus.load('axolotlr')
if axolotl == 0:
    print("The name entered does not correspond to any corpus")

If an existing corpus is entered, a list is returned.

axolotl = elotl.corpus.load('axolotl')
for row in axolotl:
    print(row)

['Hay que adivinar: un pozo, a la mitad del cerro, te vas a encontrar.', 'See tosaasaanil, see tosaasaanil. Tias iipan see tepeetl, iitlakotian tepeetl, tikoonextis san see aameyalli.', '', 'Adivinanzas nahuas']

Each element of the list has four indices:

non_original_language
original_language
variant
document_name

tsunkua = elotl.corpus.load('tsunkua')
  for row in tsunkua:
      print(row[0]) # language 1
      print(row[1]) # language 2
      print(row[2]) # variant
      print(row[3]) # document

Una vez una señora se emborrachó
nándi na ra t'u̱xú bintí
Otomí del Estado de México (ots)
El otomí de toluca, Yolanda Lastra

Normalizing nahuatl orthographies

Import the orthography module and Load the axolot nahuatl corpus.

import elotl.corpus
import elotl.nahuatl.orthography
a = elotl.corpus.load("axolotl")

Creates a normalizer object, passing as parameter the normalization to be used.

The following normalizations are currently available:

sep
- Alphabet often seen in use by the Secretaría de Educación Pública (SEP) and the Instituto Nacional para la Educación de los Adultos (INEA). important characteristics of this alphabet are the use of "u" for the phoneme /w/, "k" for /k/, and "j" for /h/.
inali
- Alphabet in use by the Instituto Nacional de Lenguas Indígenas. Uses "w" for /w/, "k" for /k/, and "h" for /h/.
ack
- Alphabet initially used by Richard Andrews and subsequently by a number of other Nahuatl scholars. Named after Andrews, Campbell, and Karttunen. Uses "hu" for /w/, "c" and "qu" for /k/, and "h" for /h/.

If an unsupported normalization is specified, sep will be used by default.

You can use the normalize method to normalize a text to the selected orthography. And the to_phones method to get the phonemes.

>>> n = elotl.nahuatl.orthography.Normalizer("sep")
>>> n.normalize(a[1][1])
'au in ye yujki in on tlenamakak niman ye ik teixpan on motlalia se tlakatl itech mokaua.'
>>> n.to_phones(a[1][1])
'aw in ye yuʔki in on ƛenamakak niman ye ik teiʃpan on moƛalia se ƛakaƛ itet͡ʃ mokawa.'

>>> n = elotl.nahuatl.orthography.Normalizer("inali")
>>> n.normalize(a[1][1])
'aw in ye yuhki in on tlenamakak niman ye ik teixpan on motlalia se tlakatl itech mokawa.'
>>> n.to_phones(a[1][1])
'aw in ye yuʔki in on ƛenamakak niman ye ik teiʃpan on moƛalia se ƛakaƛ itet͡ʃ mokawa.'

>>> n = elotl.nahuatl.orthography.Normalizer("ack")
>>> n.normalize(a[1][1])
'auh in ye yuhqui in on tlenamacac niman ye ic teixpan on motlalia ce tlacatl itech mocahua.'
>>> n.to_phones(a[1][1])
'aw in ye yuʔki in on ƛenamakak niman ye ik teiʃpan on moƛalia se ƛakaƛ itet͡ʃ mokawa.'

Package structure

The following structure is a reference. As the package grows it will be better documented.

elotl/                              Top-level package
          __init__.py               Initialize the package
          corpora/                  Here are the corpus data
          corpus/                   Subpackage to load corpus     
          nahuatl/                  Nahuatl language subpackage
                  orthography.py    Module to normalyze nahuatl orthography and phonemas
          utils/                    Subpackage with useful functions and files
                  fst/              Finite State Transducer functions
                        att/        Module with static .att files
test/                               Unit test scripts

Development

Requirements

python3
HFST
GNU make
virtualenv
Python packages
- setuptools
- wheel

Quick build

virtualenv --python=/usr/bin/python3 venv
source venv/bin/activate
make all

Step by step

Build FSTs

Build the FSTs with make.

make fst

Create a virtual environment and activate it.

virtualenv --python=/usr/bin/python3 venv
source venv/bin/activate

Update `pip` and generate distribution files.

python -m pip install --upgrade pip
python -m pip install --upgrade setuptools wheel
rm -rf build/ dist/
python setup.py clean sdist bdist_wheel

Testing the package locally

python -m pip install -e .

Send to PyPI

python -m pip install twine
twine upload dist/*

License

Mozilla Public License 2.0 (MPL 2.0)

References

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
License
- OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)
Natural Language
- Spanish
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence
- Utilities

Release history Release notifications | RSS feed

This version

0.0.1.16

Sep 17, 2021

0.0.1.15

Sep 3, 2021

0.0.1.14

Sep 3, 2021

0.0.1.13

Sep 3, 2021

0.0.1.12

Sep 3, 2021

0.0.1.11

Sep 3, 2021

0.0.1.10

Aug 25, 2020

0.0.1.9

Aug 25, 2020

0.0.1.7

Aug 25, 2020

0.0.1.6

Aug 21, 2020

0.0.1.5

Jul 30, 2020

0.0.1.4

Jul 18, 2020

0.0.1.3

Jul 18, 2020

0.0.1.2

Jul 11, 2020

0.0.1.1

Jul 11, 2020

0.0.1

Jul 11, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elotl-0.0.1.16.tar.gz (2.2 MB view hashes)

Uploaded Sep 17, 2021 Source

Built Distribution

elotl-0.0.1.16-py3-none-any.whl (2.2 MB view hashes)

Uploaded Sep 17, 2021 Python 3

Hashes for elotl-0.0.1.16.tar.gz

Hashes for elotl-0.0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`31633eb5dde35eabce4157e681dbcc674899fdd7e35b71189a620fd5c4536f5b`
MD5	`4c8c21ca33bcc29839c1cfa4c111289b`
BLAKE2b-256	`8722c31f3bc13c47d7c43dbf779abf95ad8e50ecb79dff0abe3ea9f3d469862f`

Hashes for elotl-0.0.1.16-py3-none-any.whl

Hashes for elotl-0.0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c9975f6f7ee285251d151f7fae2233f414eb0ae55b4d27c87134f3fa157babcb`
MD5	`1f27170d5c6f7868168ac593d9946137`
BLAKE2b-256	`e5c2b1661f66e5b6109ac39feed0a9d411220aaf6e665b84a27e4bf8789c27ef`

elotl 0.0.1.16

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Py-Elotl

Installation

Using pip

From source

Use

Working with corpus

Listing available corpus

Loading a corpus

Normalizing nahuatl orthographies

Package structure

Development

Requirements

Quick build

Step by step

Build FSTs

Create a virtual environment and activate it.

Update pip and generate distribution files.

Testing the package locally

Send to PyPI

License

References

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Using `pip`

Update `pip` and generate distribution files.