Python binding for nlpO3 Thai language processing library
Project description
Python binding for nlpO3, a Thai natural language processing library in Rust.
- Thai word tokenizer
- use maximal-matching dictionary-based tokenization algorithm and honor Thai Character Cluster boundaries
- use user-supplied dictionary
- 2.5x faster than similar pure Python implementation
pip install nlpo3
Load file path/to/dict.file
to memory and assigned it with name custom_dict
.
Then tokenize a text with custom_dict
dictionary:
from nlpo3 import load_dict, segment
load_dict("path/to/dict.file", "custom_dict")
segment("สวัสดีครับ", "custom_dict")
it will return a list of strings:
['สวัสดี', 'ครับ']
(result depends on words included in the dictionary)
For more documentation, go https://github.com/PyThaiNLP/nlpo3
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpo3-1.2.0.tar.gz
(9.2 kB
view hashes)
Built Distributions
nlpo3-1.2.0-cp39-cp39-win_amd64.whl
(549.1 kB
view hashes)
nlpo3-1.2.0-cp39-cp39-win32.whl
(492.9 kB
view hashes)
nlpo3-1.2.0-cp38-cp38-win_amd64.whl
(549.3 kB
view hashes)
nlpo3-1.2.0-cp38-cp38-win32.whl
(493.0 kB
view hashes)
nlpo3-1.2.0-cp37-cp37m-win_amd64.whl
(549.2 kB
view hashes)
nlpo3-1.2.0-cp37-cp37m-win32.whl
(493.1 kB
view hashes)
nlpo3-1.2.0-cp36-cp36m-win_amd64.whl
(548.8 kB
view hashes)
nlpo3-1.2.0-cp36-cp36m-win32.whl
(492.9 kB
view hashes)
Close
Hashes for nlpo3-1.2.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6f8abdda4ab41a31136a46d18d653543a908caee4aa14c9419b927a3f3a58d9 |
|
MD5 | 072498960c41d372b45ba4f1e88fcb88 |
|
BLAKE2b-256 | e57d77afcf94b148ede1377318312efe22867a8a24eda7cc9bef047e5111b772 |
Close
Hashes for nlpo3-1.2.0-cp39-cp39-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0d0c99dffb1ab31d824448098f5a7d82d31da18085c568b1f2ed0b11e27e6b8 |
|
MD5 | b126e9bc4ea5d05a16ac7225c829827b |
|
BLAKE2b-256 | eade348c3d0e0b44a0e1148c4614ab09b12e7b4ebc52628798046ba1bab61b7c |
Close
Hashes for nlpo3-1.2.0-cp39-cp39-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33b65db46edc78ed1017c44f3dadda78886ee5ec4b98c397ceabeaf4823949e6 |
|
MD5 | add34e93ae70055db022b0837ddeb9c6 |
|
BLAKE2b-256 | 2eab53cc39522481728a778d8eb9a3da3ba547b9024abbfbb3e89e2bc1b9cd19 |
Close
Hashes for nlpo3-1.2.0-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c4b531a3d0ff8749160179fd92a74f9d62533741082627338b0e051b9d8ab10 |
|
MD5 | fb32abe7fa626434ea66f32110aa6058 |
|
BLAKE2b-256 | fa1f574a516169812fd870fe131cf6e88f9494e156c358c6b0135d161f6a20a8 |
Close
Hashes for nlpo3-1.2.0-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 018b1cd50e174173def430a246b57d2a0f16e0884ebce27dfc241a515723cc0b |
|
MD5 | c07ef6ee331349055a18472cad56e623 |
|
BLAKE2b-256 | d35a9413052460e9ade57ffb53c87d35079c1dbec24a01e09833fb58e74bb04e |
Close
Hashes for nlpo3-1.2.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7603b4b6b7ebd9f4469f329646e2c516547af8897011b85ee5577b185f929ea |
|
MD5 | 912f30e1c47dfc4497efee0e719d07fb |
|
BLAKE2b-256 | 06b5398ac61812ac5693142f00eee7d83a5844b8a653bd82f052aa989589654d |
Close
Hashes for nlpo3-1.2.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | caf565838e9a7618e306cb29fa6ac1627e2c000c8f49dba8b05d7791a4af9cc6 |
|
MD5 | 8b16671858100ceec8d383fc14e52ea6 |
|
BLAKE2b-256 | b81f24a9b1fa777d4e9a17663dfc21da7d22a2bf2b0837396cadcc52f955cf07 |
Close
Hashes for nlpo3-1.2.0-cp38-cp38-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da0019f3d164987507d4f66d82d46fb3a2d6a5c815180346003614d5d0e3d7d1 |
|
MD5 | 342184efc43733872a61757667ea18f9 |
|
BLAKE2b-256 | 8c33f859c75ccd12f07b98d4be5fbc8a47bd6d90cc1235fb1c7f048254238dc9 |
Close
Hashes for nlpo3-1.2.0-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8aaa29ad6310916360da4ef71d57c9182d027319ff23c89a792f8926af2092a3 |
|
MD5 | ab9b6072c73567e7cc5359d6555679e7 |
|
BLAKE2b-256 | fbe9d5c247c76a3343d701ab0cb9082385081412abde7766375ded9445ede125 |
Close
Hashes for nlpo3-1.2.0-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67ebb4362b6f853495e18bc2a01e6cea371571f65c0c17f13e0b82306c46fc51 |
|
MD5 | ff1e2a7188a75fb0b321404c5dc1d585 |
|
BLAKE2b-256 | 225cfcf2790f4bca3f3ad8d61652468a7d13a17e89029e9f794e9644b757ded3 |
Close
Hashes for nlpo3-1.2.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 46d8c91dde47471901185fe019595a11b57894e150730e19c598a6b1865ac4f4 |
|
MD5 | ec336cb200c52e404f49314c5fcb74bb |
|
BLAKE2b-256 | fc8443ed6e66e1e3a45b6f192d08a2648cc41415aeb16991f27e75de1e0b5b84 |
Close
Hashes for nlpo3-1.2.0-cp37-cp37m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bae0bb1c2ba50d652fe94803bdfbdcd56b684b77cbc3da496788ece681da3e24 |
|
MD5 | a9897f57ee568d9ab164b89dcabdd63b |
|
BLAKE2b-256 | b976be8dc8288c56ab40a3e4cd9f2d7402cd6a1c91a34aa29b64e574b212f12f |
Close
Hashes for nlpo3-1.2.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 280b6cc5d5d7f73daed394677dcba14f1220321d652b0713dfb4280fae6181e4 |
|
MD5 | 691f9e2eee0a290fd8b46f76f46f44b1 |
|
BLAKE2b-256 | 33127b0329884659567de1961ee3c7980aa6b957f3736f8c9cdccecc0ace056e |
Close
Hashes for nlpo3-1.2.0-cp37-cp37m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b970de92bd9cdcf78c54dd2fa7da667d46050e53126046e4e208496df9cdfaea |
|
MD5 | adf62b02e2a52df4e3df7978a17b63b8 |
|
BLAKE2b-256 | 6df88b808a75e0054e8a80936c83f7bb66927570ec23931d56cc1f09aec86e7b |
Close
Hashes for nlpo3-1.2.0-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddbe56cde9ea11b489274a61e1e64b5c37b5da44e45e307d5b2fe1ab3f3747cd |
|
MD5 | 0055bb731ad6a19171d5f170d73f6a0a |
|
BLAKE2b-256 | 03c7df9f85a91e852ee13e0ad951df5c395403abe7d9a6550d7f8d8b47176dab |
Close
Hashes for nlpo3-1.2.0-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 10b3b5e013f4030e2ab0b0d39b23513412eefcb087e4418e8b4ab8e250e8dcfd |
|
MD5 | a33df4827b9d7e735378e272e94089fc |
|
BLAKE2b-256 | 293aa926aebc1a32cd6590c887da396747475850b7e388322864fdf86ba2a36e |
Close
Hashes for nlpo3-1.2.0-cp36-cp36m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe85383fa1813381abeb3e0d2c6840440379e24a9880bc0a150ed9663a4ac5ad |
|
MD5 | af64ff8d9cfd2308af0aa849cb729851 |
|
BLAKE2b-256 | c51c18324014240083b16a2c6cf66461c892d54e51925c35e692106185a75352 |
Close
Hashes for nlpo3-1.2.0-cp36-cp36m-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b9ed2fd0e407c19551ade4e204dbd666bbea4cd0806d1875877b39c05bf05483 |
|
MD5 | 8387b2a42557d7b0b63864cf5365bed1 |
|
BLAKE2b-256 | 8528a47ce25acd5d8e8e8e87a9a67a2fbca24ee8ac02982a6c48a760d8920838 |
Close
Hashes for nlpo3-1.2.0-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a12c1ba2dc8328ce24842de036e5581c11266382ad92045a27c418b51c72a12f |
|
MD5 | 24a952637c8b2a3149e771b251f14e57 |
|
BLAKE2b-256 | 70f76af73ea48a6d804b9289c24f25150c6fac497565c72e4cf66c3652186782 |
Close
Hashes for nlpo3-1.2.0-cp36-cp36m-manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 568da1d035b8f193d3a8849753f074a6cc1d5d113041178d0aeb25584f7ddfc3 |
|
MD5 | 45e9a64350257fe30d1c86bb6ff079b6 |
|
BLAKE2b-256 | 34f9cb9bc3c480819942352a0504a553a1d0d339a69d0042843797e51525e51f |
Close
Hashes for nlpo3-1.2.0-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07312915ab5b76bcc173b1d2742d62e1479475b1567a4adc764e3229962550aa |
|
MD5 | 273ef95414773ace00f5114fe8ab19e9 |
|
BLAKE2b-256 | 8b0c430a8d18732a85688fcef2f2579f7f040d676237cf2a3bd3cb302ff7e677 |
Close
Hashes for nlpo3-1.2.0-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 442b79d58161afd1ee94d3f789ba0640624d4f1c333b1d38b86abff16e77d56e |
|
MD5 | b69ccf081bd94628b2648d85b20be5cf |
|
BLAKE2b-256 | f202897b3ea77568c5520ca36c99b594474abbe45947e7b3156395da90c2568d |