Python binding for nlpO3 Thai language processing library in Rust
Project description
nlpO3 Python binding
Python binding for nlpO3, a Thai natural language processing library in Rust.
Features
- Thai word tokenizer
segment()
- use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries- 2.5x faster than similar pure Python implementation (PyThaiNLP's newmm)
load_dict()
- load a dictionary from plain text file (one word per line)
Dictionary file
- For the interest of library size, nlpO3 does not assume what dictionary the developer would like to use. It does not come with a dictionary. A dictionary is needed for the dictionary-based word tokenizer.
- For tokenization dictionary, try
- words_th.tx from PyThaiNLP - around 62,000 words (CC0)
- word break dictionary from libthai - consists of dictionaries in different categories, with make script (LGPL-2.1)
Install
pip install nlpo3
Usage
Load file path/to/dict.file
to memory and assign a name dict_name
to it.
Then tokenize a text with the dict_name
dictionary:
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "dict_name")
it will return a list of strings:
['สวัสดี', 'ครับ']
(result depends on words included in the dictionary)
Use multithread mode, also use the dict_name
dictionary:
segment("สวัสดีครับ", dict_name="dict_name", parallel=True)
Use safe mode to avoid long waiting time in some edge cases for text with lots of ambiguous word boundaries:
segment("สวัสดีครับ", dict_name="dict_name", safe=True)
Build
Requirements
- Rust 2018 Edition
- Python 3.6 or newer
- Python Development Headers
- Ubuntu:
sudo apt-get install python3-dev
- macOS: No action needed
- Ubuntu:
- PyO3 - already included in Cargo.toml
- setuptools-rust
Steps
python -m pip install --upgrade build
python -m build
This should generate a wheel file, in dist/
directory, which can be installed by pip.
Issues
Please report issues at https://github.com/PyThaiNLP/nlpo3/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for nlpo3-1.2.6-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd8bdf88fdeca5686608bedc80db7d13cf7a153f2f01c64ec1cf4785198314ad |
|
MD5 | 56efbc591d70ebcfbbfd7179c9f5d23f |
|
BLAKE2b-256 | a57babc2bc87a4119266a842426698475d22714e70811c73c9bd2e997cef41ac |
Hashes for nlpo3-1.2.6-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eda81d3d09d8c1c3b438c5d5ac3bfae2c14c8361d1d7401a5065ce1f213a4943 |
|
MD5 | 83ed014bb407030999f8a5ab4ba083bc |
|
BLAKE2b-256 | d6c64b3d166671066d0a1b9c58a74a1b69c0e0d56399bd2b584ed3ab5093f4dd |
Hashes for nlpo3-1.2.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd84a38ad3011bd9b4c768ebc7eefd43f3d0a12e80cd8fd0eca2d580c80b7065 |
|
MD5 | 2f3808f04656f34e25375bf4906da510 |
|
BLAKE2b-256 | ee2c793ad4d709e59f22378f75de4392fd1b5647743c4791e2a582cce8654df2 |
Hashes for nlpo3-1.2.6-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4105ca975a65d3b33bf8a32a853aa26f6d35c7e506302739d2cbb76b99c9bd51 |
|
MD5 | 5093035e7cfe84de024304c7f6ec7cde |
|
BLAKE2b-256 | c3134e965a8fecf202817e36739e3ed9f67802f4711c73bfb11f93485af2dcbf |
Hashes for nlpo3-1.2.6-cp310-cp310-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce9c00c84e4fc782d13f752ee04003d4e57f90de5d1acccf0eadcae34b534e7a |
|
MD5 | 7ff0e18f35326fc002306e82433ac7c6 |
|
BLAKE2b-256 | 48a6f2eda4c6add3d43937bd5bd611123ffb1e955e9633adf3aefe4583d243bb |
Hashes for nlpo3-1.2.6-cp310-cp310-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f2503fa7a39b5827b5758cbdfcfc8e52b5597209d161fe53679e69bbf912d0a |
|
MD5 | 194ba47c56321a17fb5e2c5264e4e4d8 |
|
BLAKE2b-256 | f663360761f04825a6476e7dcb890be0d8a774557a1e2a5e3b021a995474cafd |
Hashes for nlpo3-1.2.6-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e89731b893ad1296513e3ca924a286d7ab80d5a7c9fd02f6598a989306a3af2f |
|
MD5 | 56092ed8bf70970f0389dd77b742b0fa |
|
BLAKE2b-256 | 932b017d10d7d6da3013c1bcb8c84cada4a46ca92bd29d2b4f4139c466e762cb |
Hashes for nlpo3-1.2.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5ff8e5a99635f9e7406652570e232f85e8e5d27a2b705803bf7fa78af66cfa0 |
|
MD5 | f438f75d0a1435c79c04e951e4ff7d3b |
|
BLAKE2b-256 | 7655eca4aac369db5547907494ce62a19e6cb57130ef8d63851bbbadbf245221 |
Hashes for nlpo3-1.2.6-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8faedd3d8a8eabe6b4c62bbe7b5d42fa57bba722a14d773710fb4a135e71fb8 |
|
MD5 | 0777444dde25d971c48d497559da5db9 |
|
BLAKE2b-256 | af803a5fbd72135888294c9bd7865a178236db469f8e565e47a3ff4e8f06b7fa |
Hashes for nlpo3-1.2.6-cp39-cp39-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2851cea43cff7af9b48052a5aacc68f43380911b45174bda0ea28745320d7e6b |
|
MD5 | a1719e2467abd067ec513222659d71d1 |
|
BLAKE2b-256 | 45164c72b55b141ac3217e09cbe4b9557d9479721b8ab79dfed725829b997990 |
Hashes for nlpo3-1.2.6-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32fa6a1ab1ef2f866b169214b3666c50554337d09b23564152e927861096dbb1 |
|
MD5 | 12711a5d25effa5fd9b17b1a7950db9f |
|
BLAKE2b-256 | a9c28e3b86f66d05465cffbc6a92da0fab5863ad8aaf69156a429e8d122e6382 |
Hashes for nlpo3-1.2.6-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa05b78f0cb6e3f56a7d043f52a6ebcade57caaaa7eadf991bde752fa5093ee4 |
|
MD5 | 9549a797014f3f06f5deb7285854f7b2 |
|
BLAKE2b-256 | 7a9cd962e010a5d6ae8d6d1e7e51b9ba61098aa9ff1837594a1cd97c31c3f2ac |
Hashes for nlpo3-1.2.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61414b6d22ed47fc4f4d898d8e5fd86cf03844310cdfd7768fd489ec8fd72409 |
|
MD5 | 8fd3cd451bfd65288c6b8c3b49e24427 |
|
BLAKE2b-256 | 817b440b3af285c0146d4902b5770ea57d8fac9be1b3f9a078d96d62350cb482 |
Hashes for nlpo3-1.2.6-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c85272f42562143b4a5d6d2402bccd7034940b564b920eb6f9d1b27d7b779a2d |
|
MD5 | 65d65f9371a9046d8788f000f6b69d40 |
|
BLAKE2b-256 | e7d753d4cce4bacf41f51b6959cf8bbac0030778ded5694f5171fd76e83510e4 |
Hashes for nlpo3-1.2.6-cp38-cp38-macosx_12_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3a78716651e3c06d690900fc190384ee91ebf92cb8c70c00baece45b407ecf16 |
|
MD5 | 2318e32b2413e7540b6c2dd9a18a2974 |
|
BLAKE2b-256 | 7396df49be1d79b4e9f690bc987b6e1fd49f7a74c73f525ae52b5c432cb46c69 |
Hashes for nlpo3-1.2.6-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8f3969a60be8d48446b1bca0f1b96bd3b9d5aa7d4589d10614974945d501fbd |
|
MD5 | ac7c8087f68b3c95efbb46bfce230b57 |
|
BLAKE2b-256 | 3dedaa18ee30375b2f81b9094eeb29fac929c5d19f534cd65e2518772172f4da |
Hashes for nlpo3-1.2.6-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b9dfa434dc7b11afeb47bc608fab4281adc1ff39f40792b3fda549d2301fb58 |
|
MD5 | 6144f01e1123e12501bff6125b474ae8 |
|
BLAKE2b-256 | 7a5a165e8ec5c64c6b7c3bb459f87e35da4005baa72a515d3e933ad1a9713758 |
Hashes for nlpo3-1.2.6-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ef69c1a098e2f42fbe183e4f98a1d193f4ca52970a3b7c4e79dcc563ccf9a2c |
|
MD5 | ee91e43ebc795ba031291e463df9c896 |
|
BLAKE2b-256 | 13511a4267b55bbd9d301117280734104e1091b4ff2ac50c749a88f596eba400 |
Hashes for nlpo3-1.2.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a4e9c8f2df19da194b201c953cd8876c42ee42e38f64570a6ce5ec346c3a0ad |
|
MD5 | 3bd53ef3ee309e0072e9449c3180cefe |
|
BLAKE2b-256 | 95e3cb759fb742682bd02117a8ee382ae42184061e4b9fa4bd8510d806b042f9 |
Hashes for nlpo3-1.2.6-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5c767123f6d94d9fb530b8abf9acb08684937368e26900d70b04257d1d38627 |
|
MD5 | 5ac642b67bb48f41cfe4bb3fe1a28fc9 |
|
BLAKE2b-256 | 444becf8c7cd7a915c737d6603a0216c1946b989913e62dd3c12dd7a8d487929 |
Hashes for nlpo3-1.2.6-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ffa1802ea1d3c985f5ee7207f010f3fa968c1269102acd5f95d7eb77e56ee10 |
|
MD5 | 33c189da93ff4df14aa1d31ecb0bd694 |
|
BLAKE2b-256 | 660f0d15769bc2eae66989ab6aa07ebf0a77f0f5796baeac445d7c38f1110fa6 |
Hashes for nlpo3-1.2.6-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1771075f21446ac737bff5a31d7d2e8d7bca8ba6ced1b3449dccbcfee742fea |
|
MD5 | 3125e88b52153d358081ec3701a0f7f3 |
|
BLAKE2b-256 | 430fc002932a2bb8de3d23093c28ad9924c2489e32c6ed3361c7d1068722a6b7 |
Hashes for nlpo3-1.2.6-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88789bc146a0be17f50d466992b6853aafa0674a6666352addd882cb3ef8dc26 |
|
MD5 | 668e75736ec54c5eca678d056ebfbcdd |
|
BLAKE2b-256 | d2da832d11f465df74536cfa49e2219c43163d60099cfb51949ad890c7477b29 |
Hashes for nlpo3-1.2.6-cp36-cp36m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac1d5210a68009bfcb59d752f544a8a4e706d880ea2dfec56b34ce397bba982d |
|
MD5 | 118212d6319a4b0e73c99f3f77d6d7da |
|
BLAKE2b-256 | 9fdc26afcd9b1c39c73b3fc5a31648d9c1da8ffe53a725788957f2a89711f830 |
Hashes for nlpo3-1.2.6-cp36-cp36m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc737936fd8f332a9747d4c8e1758f6e7de78909df1dc5d7aa68dda334269be2 |
|
MD5 | 85af1890b596bf7c1fffa6c8ce6c0cf9 |
|
BLAKE2b-256 | a817f7c5bab40c503ad62807dcecced970b30aed5e5f63132a027ea957276a57 |
Hashes for nlpo3-1.2.6-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 16465aea9212dd61645c09aa0949fd39f28f70c10bf809127ef3401d65be431e |
|
MD5 | 9571a04ed33745ddf10b313da24d16d9 |
|
BLAKE2b-256 | 8490692007f9d7c50e9b7cc27edc8c7dcc6f342d5a33357d0f4728526dc94121 |