Python binding for nlpO3 Thai language processing library
Project description
Python binding for nlpO3, a Thai natural language processing library in Rust.
- Thai word tokenizer
- use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
- use user-supplied dictionary
- 2.5x faster than similar pure Python implementation
pip install nlpo3
Load file path/to/dict.file
to memory and assigned it with name custom_dict
.
Then tokenize a text with custom_dict
dictionary:
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "custom_dict")
it will return a list of strings:
['สวัสดี', 'ครับ']
(result depends on words included in the dictionary)
For more documentation, go https://github.com/PyThaiNLP/nlpo3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpo3-1.2.2.tar.gz
(9.6 kB
view hashes)
Built Distributions
nlpo3-1.2.2-cp39-cp39-win_amd64.whl
(548.9 kB
view hashes)
nlpo3-1.2.2-cp39-cp39-win32.whl
(492.9 kB
view hashes)
nlpo3-1.2.2-cp38-cp38-win_amd64.whl
(548.9 kB
view hashes)
nlpo3-1.2.2-cp38-cp38-win32.whl
(493.0 kB
view hashes)
nlpo3-1.2.2-cp37-cp37m-win_amd64.whl
(548.9 kB
view hashes)
nlpo3-1.2.2-cp37-cp37m-win32.whl
(493.1 kB
view hashes)
nlpo3-1.2.2-cp36-cp36m-win_amd64.whl
(548.6 kB
view hashes)
nlpo3-1.2.2-cp36-cp36m-win32.whl
(492.9 kB
view hashes)
Close
Hashes for nlpo3-1.2.2-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e48632ef7fe4fc5f711b4594eaeb1bdb6ec00f9154b51e94fcca9d8634f236c |
|
MD5 | 134fcc46760504046acb534685fa5167 |
|
BLAKE2b-256 | 5ddcee98759c1ed6cc88788a3d9bb809169376b6c6f26017373f8c4017326c99 |
Close
Hashes for nlpo3-1.2.2-cp39-cp39-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 29c08a931e9d79752048ab120718d622efe272a37b097164f0e5cd10ced500bc |
|
MD5 | f354c3f506a11477f08fdd1e91bd2a0c |
|
BLAKE2b-256 | 8ba13f381eccea0ae289e0e3f8a45bcaaea7c28d651708d3c2702d4c287c13e0 |
Close
Hashes for nlpo3-1.2.2-cp39-cp39-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6faf6d75f967c749a02b19bb98b24e43b6978e0a04d3bbc39da6c64d82b8c104 |
|
MD5 | 6b043e835d9094f3150241de472a84e0 |
|
BLAKE2b-256 | a0e0094ffb6234967a6dbb15d79a24ca40b1d83313d0c8899dee7e3db48e341b |
Close
Hashes for nlpo3-1.2.2-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bb3f6553eb966075542a10e834a3941f1e90bcfe7f9f46d3c7580bcb54c0f745 |
|
MD5 | 9d9f8c8ff4914a1273e790804ab52a95 |
|
BLAKE2b-256 | 6b98c11ba1d34db65569c13eb31221f1fe3b1a56937a4c900076c050ba472d6b |
Close
Hashes for nlpo3-1.2.2-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9e7e5a67a58683f2824078c98e159d560860a4e995ef1fc259d41e408021aee5 |
|
MD5 | 27d64537c47e1d183c489c7729dace5e |
|
BLAKE2b-256 | 8c5e4d319dc5fd999f8516b1fb5da5d4b61887debbc4ee697d13e2d172baef0b |
Close
Hashes for nlpo3-1.2.2-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5efe0cdebeb5ae1422f81613cd413f2e6f6d5c011094065e618882169edfc162 |
|
MD5 | 5028b2348d9ede57207754479fb8337b |
|
BLAKE2b-256 | 3d9df126a2124e9fa276ba9557d20f690fc1798fcc98212f191b16d60842cca3 |
Close
Hashes for nlpo3-1.2.2-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 01ac4a90b2ae8dada01ec816218573f99d9b4c67abd7659c62be8be13da86bb0 |
|
MD5 | 53ae7ac1141bfb0b57db87d57c088652 |
|
BLAKE2b-256 | e22402ccb81c5e01e57a65c36e107a33775caab05bfa776a2dace386251dc680 |
Close
Hashes for nlpo3-1.2.2-cp38-cp38-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0200462c6f455558ac1acfc2856bdfffe3fed046c0df197b91147b966e9d9a3a |
|
MD5 | f6819c5f18c88ef20b05ebb39ad9d098 |
|
BLAKE2b-256 | e089918d0e4555c4a28dcecf8834f28437b3bee81c0ad071c9b41150d140bdbb |
Close
Hashes for nlpo3-1.2.2-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 465f504c76d4b1cdff5dacd7b7ce47f2ae306cbb40e9e8641e5fd3f090ff3231 |
|
MD5 | 1de7652edd072105bda68d6b9b725264 |
|
BLAKE2b-256 | 26279c40359912434d27a238559ff65f5c3f362efdd131a6c71278b64f1e2273 |
Close
Hashes for nlpo3-1.2.2-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bf4714613aa5b9b9accf23a5e16581dff679214709edf08b7ed1af5d78ca729c |
|
MD5 | 0234d94525655347eb4b65c40750fd52 |
|
BLAKE2b-256 | 9423df36f3df337d64c5ae6b731df85032940d4b889c8c8f01385194db1013df |
Close
Hashes for nlpo3-1.2.2-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d7cb053f04796a0ac6f7b34f969ebb4eb2343315811605a25d202ba4ce57adc |
|
MD5 | 2609302972ef715bf71ff3056d834cdc |
|
BLAKE2b-256 | adf14eccd3dff90d2f44d7cdc94b3481333752f05876f9a16e148a30dc01e6dc |
Close
Hashes for nlpo3-1.2.2-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6cf1a10b92afc395b6203f4f45401538928c9567fd271780202068a4007b06a7 |
|
MD5 | ea8530de023b33ca4497bd476a9f5460 |
|
BLAKE2b-256 | b7102f44465742c83d9e50b922035c90d82ee59baab78671e089c400ed0f4563 |
Close
Hashes for nlpo3-1.2.2-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb7081e1f864a986f6696308aaf9300ffe6be1e9f84eb19e669d6451c927b6dc |
|
MD5 | ed57b67b6c5085d9164b540e69b2640a |
|
BLAKE2b-256 | c76ced916982a7eb3a4fdca4ef972dd67cfc61d6453de602a4b4621a7585344e |
Close
Hashes for nlpo3-1.2.2-cp37-cp37m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06fd1b607d16c82f266b7240312d43bd07481ebd08fe37c4c29e1bb7084633a7 |
|
MD5 | 7d7c4760a620e8512f2721156a9b3555 |
|
BLAKE2b-256 | e26c83d85fb1ac692f6f005fa9cd60d34ec44707b50b7b703dbe97e32bd54efe |
Close
Hashes for nlpo3-1.2.2-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bede7ed1571747714e475d1ea014795adcb337dffd46c1fc7800ce984a9efa9 |
|
MD5 | cb952408cab4fbd782d83dd64395ed3e |
|
BLAKE2b-256 | 00d83447da4555fc5a9607d7567d4f0f34a3c0bf1fe54ae497c39ebcaed2923b |
Close
Hashes for nlpo3-1.2.2-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f33731cec0f6fb08083cd9300c63aa07f1daa24ef4becbc54e08fe22872f37ea |
|
MD5 | 7d00a74f7dcbcd129353f6d144e2bd63 |
|
BLAKE2b-256 | 361d8834d70c99e95d0bb45a315b6ec75eeeba87c1c6ca7ac5bbfb3f399ee691 |
Close
Hashes for nlpo3-1.2.2-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9d271600e361a57abd1c108408f2b2e5ccb24ceb80b481256348f6a57e5c0bf |
|
MD5 | 8ddea2168a1e975342b21161712169ef |
|
BLAKE2b-256 | 88d08b42ff63a80364066a03e00ec41e59abb1d717ea41c9828aa7ee2c493f78 |
Close
Hashes for nlpo3-1.2.2-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8841350807b24ebd3734c478dcb3e55a60a6c87402ad48b163c9b538de159b79 |
|
MD5 | 2b8803810a4cd26ada2f81e26ae658b4 |
|
BLAKE2b-256 | d2d6995305ef53f2f742b84e270ed4fdb929f8b0fbcc5fa2b8255aa1f5817298 |
Close
Hashes for nlpo3-1.2.2-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7151c8c81e8ef9e21ce4447a47f2a09216cbb7517901e0634e5cfda2996c3a1e |
|
MD5 | 738760002b5c5322eef8ce10fa27eb02 |
|
BLAKE2b-256 | 714dd03ff6c9699ae85c8dfa9e0a8c2ab8a95ce2e5a9405b89dc5a5f60a29663 |
Close
Hashes for nlpo3-1.2.2-cp36-cp36m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91608f5bb63cd08f427f33f5d26f5b9fc0df4537149b7db1531c4f4c255be97f |
|
MD5 | 5937844d055d34cf1a856d477b67a6df |
|
BLAKE2b-256 | e24d4d68a79462bad2df8b0d35342cc55089f6ee8adba430022551468a093773 |
Close
Hashes for nlpo3-1.2.2-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | caeadbfd370ef80a05c1d605fe35616ef6dce2963195660b96d35e74eb0a6614 |
|
MD5 | d30cb0a7862b84c85131f021665dfa53 |
|
BLAKE2b-256 | d314725428a7f93956b736282fc5420a018b49dde3224d96c5a0668593a58e6a |
Close
Hashes for nlpo3-1.2.2-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f38c5ec9f9246499961b530e9bf61d24b89e933d9a0148477c11875288887cd5 |
|
MD5 | 527f7a296b908ca824e513bebd3f663d |
|
BLAKE2b-256 | 3b9f9ec9fb42475c4ca68233c03b87c5384ad1b16dadd09e380468e058357899 |