tokenizers-collection

A simple iterator for using a set of Chinese tokenizer

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

中文分词器集合

https://img.shields.io/pypi/v/chinese_tokenzier_iterator.svg

https://img.shields.io/travis/howl-anderson/chinese_tokenzier_iterator.svg

一些中文分词器的简单封装和集合

Free software: MIT license
Documentation: https://chinese-tokenzier-iterator.readthedocs.io.

Features

TODO

使用

from tokenizers_collection.config import tokenizer_registry
for name, tokenizer in tokenizer_registry:
    print("Tokenizer: {}".format(name))
    tokenizer('input_file.txt', 'output_file.txt')

安装

pip install tokenizers_collection

更新许可文件与下载模型

因为其中有些模型需要更新许可文件（比如：pynlpir）或者需要下载模型文件（比如：pyltp），因此安装后需要执行特定的命令完成操作，这里已经将所有的操作封装成了一个函数，只需要执行类似如下的指令即可

python -m tokenizers_collection.helper

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-08-28)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.1.2

Aug 28, 2018

0.1.1

Aug 28, 2018

0.1.0

Aug 28, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers_collection-0.1.2.tar.gz (11.2 kB view hashes)

Uploaded Aug 28, 2018 Source

Built Distribution

tokenizers_collection-0.1.2-py2.py3-none-any.whl (7.7 kB view hashes)

Uploaded Aug 28, 2018 Python 2 Python 3

Hashes for tokenizers_collection-0.1.2.tar.gz

Hashes for tokenizers_collection-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`b43e162333bf43e2ae80d567bc9dff48ed9ecfdffbfbf2fafc62035c52b39f9e`
MD5	`3569cce32dca6ee44e544e6a31b5bc30`
BLAKE2b-256	`5f13524a0fae90c6254b9ccc62ec416ef5236e892b7e8455a8043c3cfc94e961`

Hashes for tokenizers_collection-0.1.2-py2.py3-none-any.whl

Hashes for tokenizers_collection-0.1.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`40bf32f5352f0542a8c92c7bab73451f21c873676568ed9709acdfb88601dc03`
MD5	`fab7fe2976415933da7132badf12a8a8`
BLAKE2b-256	`f20a2058b20bbaf939b8cfad4c7205af0853fbd6d75ff057c6e194f25f591dda`