Python binding for nlpO3 Thai language processing library
Project description
Python binding for nlpO3, a Thai natural language processing library in Rust.
- Thai word tokenizer
- use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
- use user-supplied dictionary
- 2.5x faster than similar pure Python implementation
pip install nlpo3
Load file path/to/dict.file
to memory and assigned it with name custom_dict
.
Then tokenize a text with custom_dict
dictionary:
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "custom_dict")
it will return a list of strings:
['สวัสดี', 'ครับ']
(result depends on words included in the dictionary)
For more documentation, go https://github.com/PyThaiNLP/nlpo3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpo3-1.2.1.tar.gz
(9.4 kB
view hashes)
Built Distributions
nlpo3-1.2.1-cp39-cp39-win_amd64.whl
(549.3 kB
view hashes)
nlpo3-1.2.1-cp39-cp39-win32.whl
(493.0 kB
view hashes)
nlpo3-1.2.1-cp38-cp38-win_amd64.whl
(549.4 kB
view hashes)
nlpo3-1.2.1-cp38-cp38-win32.whl
(493.1 kB
view hashes)
nlpo3-1.2.1-cp37-cp37m-win_amd64.whl
(549.4 kB
view hashes)
nlpo3-1.2.1-cp37-cp37m-win32.whl
(493.2 kB
view hashes)
nlpo3-1.2.1-cp36-cp36m-win_amd64.whl
(549.0 kB
view hashes)
nlpo3-1.2.1-cp36-cp36m-win32.whl
(493.0 kB
view hashes)
Close
Hashes for nlpo3-1.2.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d99765026e2aa7ef9f09419c9e969146d0b734a0929294bf12825ee40afedff |
|
MD5 | 3a257740533e6643267b20165a7a7da8 |
|
BLAKE2b-256 | e1adc0221614e55b57f03be5c3780a25f6a16858d45e16a31bb16b24c38edbc8 |
Close
Hashes for nlpo3-1.2.1-cp39-cp39-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60ff0d2f5a5d9b9510961ee415b8ba74099bae0ec0344de03b2ac6539511bfa8 |
|
MD5 | 15448dbbaf3b4d9d454241e3feee1f56 |
|
BLAKE2b-256 | fe24f037b8f07fc17d93c21ee70869ada97c4a656f991470a2776e11ea32fdd2 |
Close
Hashes for nlpo3-1.2.1-cp39-cp39-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e3b653a1bc0e56473736a063496431947339f4ec98dbe775f4db8da872bdb03 |
|
MD5 | 86a137953052c61d0c4e702c8bce5f8e |
|
BLAKE2b-256 | ea000ed3a775081097f86f2978f5fa6d99b82eaf53cf9a5c6cc22b0d8a7604e3 |
Close
Hashes for nlpo3-1.2.1-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 705bac692c940e31c2eaad7d033e8ffeca84c2d6b398e3d960ecd89e6f3a925a |
|
MD5 | 31cfac76223b3eda7f16cf47e90a3d16 |
|
BLAKE2b-256 | 517e62bc74291d40f07c0008b99395965ebbccf902fe4a6c9937ab834061ed89 |
Close
Hashes for nlpo3-1.2.1-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed717ccd48f59d50c92978a7139a8694822321b9507d7570bf2f1359ae8bdd77 |
|
MD5 | c8a70e7fd9b85779f45b0c7bdc2e7164 |
|
BLAKE2b-256 | 6e579a69e9f2c97f90ffa10b012af63ea2975521b826024adb83158e60c34b96 |
Close
Hashes for nlpo3-1.2.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46ea70e28df35ed5100df648f2291c731616fb591439e9c164174087e86f650e |
|
MD5 | 024fd5bd0ac942dab2e18b1caf971abb |
|
BLAKE2b-256 | 699fd62735a432b9d3a235892f156b757cf3c6eb2129c039de93d584e175599b |
Close
Hashes for nlpo3-1.2.1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50a2dae2720c450360c869cef44182a1c16919141007576d74ba9f90e396f81a |
|
MD5 | 04629d67f4c75fc13bcb3a72fb64f508 |
|
BLAKE2b-256 | 4544e2968abe8c1dc6e74ea70f13ef51a358b3c18c193997d569d8791f0885ef |
Close
Hashes for nlpo3-1.2.1-cp38-cp38-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 51fba448a11ecdfdf189984ab7cf05891f2713ede0887fefd13d1edcf11f56d6 |
|
MD5 | 7b38f9f9788ea4abb4f247e2334a1f47 |
|
BLAKE2b-256 | 7f8188313b3fb2af6a26bea85889d204e6f7afac91314aef9fd453a6a10f8f29 |
Close
Hashes for nlpo3-1.2.1-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86a3923a10bb60778a01152100198995fec28983f6f5e1d98fbba25cd665604b |
|
MD5 | ff69259b1cf9f3f7e5aa402351f828c3 |
|
BLAKE2b-256 | 3e0cef11483d90fc6a4a6475aaa0765727a8e9273f4db2dab097da5eae5bac8a |
Close
Hashes for nlpo3-1.2.1-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1aaafe25b10c486ee5d8c6732bd3cc9a7719ab1835405adb846aa4eef448670 |
|
MD5 | f1f0f159e1c55feb285ce0dca869b4d2 |
|
BLAKE2b-256 | 2eaf54d40f0640f9f4c1e1d56d52f841b6e241b5bb747cb96fa75d58175688df |
Close
Hashes for nlpo3-1.2.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc8e7f894f9ec4ab29cc8368d04d55b2ea829f941d2a1cbac29b0157e151a44b |
|
MD5 | f2ca48489861586c2e0dec42f2d61cc7 |
|
BLAKE2b-256 | 8c34f0d328d14cb12fb55455f98d0656a225b2e6ba34d2b3595987da9ee1652f |
Close
Hashes for nlpo3-1.2.1-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 829e57569760d48ecf550c3a14cb388bb16ec6432780453d4a9f1d1234c104b6 |
|
MD5 | 16db32b0635503b3ec3d18e56256003c |
|
BLAKE2b-256 | ca19809b84e5cf200e5c71d822741e7b28c6fb3070c817e4b9e4823e8894b48b |
Close
Hashes for nlpo3-1.2.1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99fd61442d416e3be2b90e9d225cd38f92b80f1123db9b9ce0015580703b2199 |
|
MD5 | 972808279a802e29ab0adcec063b1610 |
|
BLAKE2b-256 | 840542d2b34f2d9d298296a516b711e65024ea52d2e91248fd74b50280a72390 |
Close
Hashes for nlpo3-1.2.1-cp37-cp37m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4339e10140e5e57b6626de18e6bdb3f11511b4d206e9ca522fca3e31c58f9d72 |
|
MD5 | b0564b6f8fe06f7c9e6f1e778fbfe58f |
|
BLAKE2b-256 | b8563ca8ebebe0e00619fdf01e4f73003912c42e257557f3823af2734f6edfe9 |
Close
Hashes for nlpo3-1.2.1-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d18df80f1a14d4e68ebd054cc45cfc8681f3331efedcbb9a581bc08f603984d |
|
MD5 | bf89ccbf01dd97049bc1d19e3f8ca89f |
|
BLAKE2b-256 | 8055d644faa6a8f8c2af2030d0dfd9c3b6fa75f34efdae5a3d73358b2eff2d59 |
Close
Hashes for nlpo3-1.2.1-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6960b5ce68392b327d5585fd67a7555db0f338cfe29fc6222a6b09c2020dbac |
|
MD5 | 783992ff1f1503879b36dfc52c05abca |
|
BLAKE2b-256 | c10e4a8d7f5081a6beb36bf851dc97a6fe3f72c29d89796a08dec1dfb0d38f5d |
Close
Hashes for nlpo3-1.2.1-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0481df4ab7e69c08facfffa75c2acffa74b692a73dfdc0068a6c29766f9be7ab |
|
MD5 | 6f020336da5af72491ad67df8cceef03 |
|
BLAKE2b-256 | ca825cace690485903f8f28be8884d91ece554a7f037959f4dde167d4394c0af |
Close
Hashes for nlpo3-1.2.1-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa61ee65a1b5bd8178d6882e8e4877e067b76e2504a608e52d6c28d6830eac1d |
|
MD5 | 487dd5a0101bacf5306d0baa8f5af411 |
|
BLAKE2b-256 | 396386ba9e186e4a6e433cb8a6de3b6592f1fab3c829a2a883046a92db855f87 |
Close
Hashes for nlpo3-1.2.1-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80d9c3ebc2d8c08d7d04dd371d95a889b8767a92d5de4d946a383e3ef31fd755 |
|
MD5 | 0d9472dd21a4042a3ca8eaef67a46544 |
|
BLAKE2b-256 | 40695e2aa8fd46d4fcf919173d7610ee32a19bb8add6d55f442b24ab754ae58e |
Close
Hashes for nlpo3-1.2.1-cp36-cp36m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5842a4e8e6110437ff4748441db57af024f6d0353848089da5b2502c4aea1a35 |
|
MD5 | 127923dcc41e62c4a1882cf6df6910cd |
|
BLAKE2b-256 | bd7a0c4934e629cb1281714f32dd367021140a555eabb748562565a7e9969a23 |
Close
Hashes for nlpo3-1.2.1-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bcb3a87c0939a7b46420eb0d96d488a7902ae87be4ede8abf5432dc6529d68d |
|
MD5 | c8a6df676052f4b362299d94c1ce9ce9 |
|
BLAKE2b-256 | 184b6b922da09a2d6c11209c2ad67037d98a8ca4253e610920a428e88ae1315c |
Close
Hashes for nlpo3-1.2.1-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc2cea760c68430007fb359b70ebcda06aed03d6a2da7a6281506b20be9dd443 |
|
MD5 | db4bdfff3f6b700c110b6c30a24e8bfb |
|
BLAKE2b-256 | 7e3044160c0d7805f94d19e8aadf52bfc9282f49611c0393a98bb6b3b90593dc |