skip to navigation
skip to content

tinysegmenter3 0.1.0

Super compact Japanese tokenizer


TinySegmenter -- Super compact Japanese tokenizer was originally created by
(c) 2008 Taku Kudo for javascript under the terms of a new BSD licence.
For details, see [here](

tinysegmenter for python2.x was written by Masato Hagiwara.
for his information see [here](

This tinysegmenter is modified for python3.x and python2.x for distribution by Tatsuro Yasukawa.
Additionaly, this tinysegmenter is modified for being more faster - thanks to
@chezou, @cocoatomo and @methane.

See info about [tinysegmenter](


pip install tinysegmenter3


import tinysegmenter
statement = '私はpython大好きStanding Engineerです.'
tokenized_statement = tinysegmenter.tokenize(statement)
# ['私', 'は', 'python', '大好き', 'Standing', ' Engineer', 'です', '.']

Test Text

The [test text]( (in the `tests` directory) was [The Time Machine]( by H.G. Wells, translated to Japanese by Hiroo Yamagata under the CC BY-SA 2.0 License.

How to run Test

Install requirements from `requirements.txt` by
pip install -r requirements.txt

then run this:
File Type Py Version Uploaded on Size
tinysegmenter3-0.1.0.tar.gz (md5) Source 2015-11-03 10KB