Python binding for nlpO3 Thai language processing library
Project description
Python binding for nlpO3, a Thai natural language processing library in Rust.
- Thai word tokenizer
- use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
- 2x faster than similar pure Python implementation
- built-in dictionary included (62,000 words, a copy from PyThaiNLP)
- support custom dictionary
pip install nlpo3
Tokenization using default dictionary:
from nlpo3 import segment
segment("สวัสดีครับ") # returns ["สวัสดี", "ครับ"]
Load file path/to/dict.file
to memory and assigned it with name custom_dict
.
Then tokenize a text with custom_dict
dictionary:
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "custom_dict")
For more documentation, go https://github.com/PyThaiNLP/nlpo3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpo3-1.1.2.tar.gz
(8.9 kB
view hashes)
Built Distributions
nlpo3-1.1.2-cp39-cp39-win_amd64.whl
(849.4 kB
view hashes)
nlpo3-1.1.2-cp39-cp39-win32.whl
(801.2 kB
view hashes)
nlpo3-1.1.2-cp38-cp38-win_amd64.whl
(849.4 kB
view hashes)
nlpo3-1.1.2-cp38-cp38-win32.whl
(801.2 kB
view hashes)
nlpo3-1.1.2-cp37-cp37m-win_amd64.whl
(849.5 kB
view hashes)
nlpo3-1.1.2-cp37-cp37m-win32.whl
(801.4 kB
view hashes)
nlpo3-1.1.2-cp36-cp36m-win_amd64.whl
(849.6 kB
view hashes)
nlpo3-1.1.2-cp36-cp36m-win32.whl
(801.8 kB
view hashes)
Close
Hashes for nlpo3-1.1.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28a291bb81a08cd22137c0f07b696fc682ed01fe562f230caa964042a720475a |
|
MD5 | 3d524aafdc5008a5e61f60324dbe5258 |
|
BLAKE2b-256 | f94193671771df46e5d82a9256a7a2e2262300c41dd4c804a190425b05695ac7 |
Close
Hashes for nlpo3-1.1.2-cp39-cp39-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7580455349585e82ef7d292d9856b9fc4235feb101a78c300dc94b72e6115e57 |
|
MD5 | efbc3b9bf844a5d65a0e5d91163c1de7 |
|
BLAKE2b-256 | 0232aed458a2eb3572d8d95100fbb314eac9a5ceb9d9f4bcb9a20ee10c3b4dec |
Close
Hashes for nlpo3-1.1.2-cp39-cp39-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a57feef0061686da32652c6b305223becc6e8acded09e831ad8c2464d672d212 |
|
MD5 | 787e267f842b5d2756219cafa6a4445b |
|
BLAKE2b-256 | 8f59710b815f8ce87a2fbf41329b012e274d3371676d2c8951dfdbd1bb25d326 |
Close
Hashes for nlpo3-1.1.2-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f6e78bd80d9eab6a500053eb4e88bb5580c6c7ff5adbdb84deba6dc2cdb66b33 |
|
MD5 | 6a7bd16fe72af28a3627da9b12d452bc |
|
BLAKE2b-256 | 48d471299ffb6ff615b8ab2b61426e806ceb26c93c21ddcba91dbaa49381b00d |
Close
Hashes for nlpo3-1.1.2-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b19cbd7d5c19861dc19e38f8657db99920f453fe548845b2059d2d52f5ff3958 |
|
MD5 | cc83a6574165f9860ca4da4fa29b23e1 |
|
BLAKE2b-256 | 19c0e675088ef806ffa9802a90b3da6781a6c465c5113b42781c7599d6d43604 |
Close
Hashes for nlpo3-1.1.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2df7389ed537ecc66064be6c1e1ff2df0522e286402ccc94309d5928032fb251 |
|
MD5 | b34df907b3b23260d7d0e84fb4b66d25 |
|
BLAKE2b-256 | f67ea78858d680991cf7426d27d8712f15be5a3576bbebab9ca1bba057d5bcb9 |
Close
Hashes for nlpo3-1.1.2-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | df15f7878c1e8a17ab383e1446a813166d4e676682e7609752a68f591ca9426b |
|
MD5 | 4a0d8a8a2f014507b4e2e3fdc6774933 |
|
BLAKE2b-256 | 5317f9c194ec8958ff73d840da0f1ab705a251a650eff931c26954401c404aa4 |
Close
Hashes for nlpo3-1.1.2-cp38-cp38-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd2b7ca048579c8f7015b79906f40be4712d64a44cf3765de2781b11de9b5992 |
|
MD5 | c98ab965e8df54ca48dd8be02f072b30 |
|
BLAKE2b-256 | 12d138c032889475493a37fc067060fb4b43e4cb297b4532406aeab7f365aedb |
Close
Hashes for nlpo3-1.1.2-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fad194467f0bf18bce76cf9e593e56c733b8407f3fc210abf209818bf0f82fc6 |
|
MD5 | a1c9d7d8e54c6af5552d37ab5c7ab9c8 |
|
BLAKE2b-256 | 9d9719c5b2d32a44afd8618585c2b45e146db15fc576f51b8d04affc8bfbc027 |
Close
Hashes for nlpo3-1.1.2-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5488b43180d4d9b3bd07ddc58c6e105316983a483a52cc7a9ad523e9e2e60892 |
|
MD5 | 1488633784df06ea4dd3d757ce76ecb8 |
|
BLAKE2b-256 | f3e54579be47f7a7295133fe6bcc19992046b2c30ae44ddf3075260249e23d4c |
Close
Hashes for nlpo3-1.1.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75a38219dfc5513bd4be9a3add7c018364ded2c710cf5e40be62dca43bf2016e |
|
MD5 | 02f1fa1128d7025732c0c6f424c0d7c7 |
|
BLAKE2b-256 | b9791b73e651aadf60b288d7b06abe4b5387d2b7fd4c9c056a230ac1693af34e |
Close
Hashes for nlpo3-1.1.2-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63bede40241b70b24c1ccb5e4b6d42ba561677cbad1ba59925cb4abdea45d04e |
|
MD5 | 0cdd3fb79f83639c76d81c0f7f11826f |
|
BLAKE2b-256 | d4586f5091ba23221ed5db46c497323fc4cf544db474ea32a7e7f0a17fb5a1bc |
Close
Hashes for nlpo3-1.1.2-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9dfddad2e15fb19d5be40c3dc39ff70379c411d75739088082c97700e163c8c |
|
MD5 | b059e2c4bb2cda2988edb7b8e58fecac |
|
BLAKE2b-256 | 2777527cf59b3f90f4cb31b100c6fc512d4e422d62dcbd6cc24752867f950872 |
Close
Hashes for nlpo3-1.1.2-cp37-cp37m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 48659e72234818af1168262860df92ae2d2101483af3092a264fe126dbbf1c66 |
|
MD5 | ebb4e8cac5e1250cf137a3917d186cf9 |
|
BLAKE2b-256 | 85ffe7df92528b39ce56aa012550cd2fac5b70bdd39a68cc6d504418cfa189e1 |
Close
Hashes for nlpo3-1.1.2-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 813e05cac1fa803abb60a387dd0e006003d7114313e2e7b8cd10e5d846d77c7b |
|
MD5 | 868dd6c46a26a145e213482ec5e3fbeb |
|
BLAKE2b-256 | a2447e752ae056cf04933c96772d144be94a6813195beb533273e305192fe130 |
Close
Hashes for nlpo3-1.1.2-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 064fd98686171df2a75f350a33fd6c9b3db4b176918f71630a0b88d6a840b1ce |
|
MD5 | 6146607f4b039c1d7e9fe282f1940f45 |
|
BLAKE2b-256 | 6fe6c9e2fb077f04d64d36848fb80a5ef85231be006ce1403e785b8075a02a11 |
Close
Hashes for nlpo3-1.1.2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1635e0cb8753fe363a4e66c80c6fde9665d4c5a487bca21f8f21b4f87fb55ddb |
|
MD5 | 9294b3ae6d390bd6aac0854d26de35a5 |
|
BLAKE2b-256 | d7a23f4cdf42833f3c24280100ffc6248619377497a7ac7481d8acf5eeafa7fa |
Close
Hashes for nlpo3-1.1.2-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 455660ba3c83504f73862c609dda9468ca9075e0a3ab8e8c8eb7ca518358e8e5 |
|
MD5 | cd9043012dc3519e3ad5ad1974768e2e |
|
BLAKE2b-256 | f2250047cfcc90dc7628a8cd3ae56d2a3d2d331157f61b82bb9189cc07ec932b |
Close
Hashes for nlpo3-1.1.2-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 650bb48a1a7fa0595fe526146784d021f29540ce4e71ee6c48464a5aa014348c |
|
MD5 | 416e89b495764f39d5ad7de16df9c2b8 |
|
BLAKE2b-256 | 005609cb1b844fad2a0fad33a18be1a1ae91e1788b581d52467744b098e11303 |
Close
Hashes for nlpo3-1.1.2-cp36-cp36m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d59ac306242f24e30c5482d5c1807459ac43ad3d1f38f361fec2b63267724ae5 |
|
MD5 | b45454aa2d6d67d3ba981a357bed1838 |
|
BLAKE2b-256 | de2ea33898a509ed112017d58969f8cce737367d3cab74213f4f49ebd30f903f |
Close
Hashes for nlpo3-1.1.2-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3f7ea1bbd4d4254db7f64bb1b562b437a650454ee96a5073a6ca60ea64d0005 |
|
MD5 | 4472a430bf23b854fe11d3f7b2200526 |
|
BLAKE2b-256 | 835f43ba305b3d4032491f5e59c6f1a02bfea5a86dd03a59698a4f7f7aa84efc |
Close
Hashes for nlpo3-1.1.2-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10f74344377bcf9a0e125b3dd0d873fd7f2690203a482d1c441488bbb43a4103 |
|
MD5 | 45ecb352e491b52812e91738f5c47855 |
|
BLAKE2b-256 | 1295df32c3853c2b95ca244cfaf10941a6c6e9d4b9de765583cb13e9e1d202ec |