Skip to main content

A simple iterator for using a set of Chinese tokenizer

Project description

中文分词器集合

https://img.shields.io/pypi/v/chinese_tokenzier_iterator.svg https://img.shields.io/travis/howl-anderson/chinese_tokenzier_iterator.svg Documentation Status

一些中文分词器的简单封装和集合

Features

  • TODO

使用

from tokenizers_collection.config import tokenizer_registry
for name, tokenizer in tokenizer_registry:
    print("Tokenizer: {}".format(name))
    tokenizer('input_file.txt', 'output_file.txt')

安装

pip install tokenizers_collection

更新许可文件与下载模型

因为其中有些模型需要更新许可文件(比如:pynlpir)或者需要下载模型文件(比如:pyltp),因此安装后需要执行特定的命令完成操作,这里已经将所有的操作封装成了一个函数,只需要执行类似如下的指令即可

python -m tokenizers_collection.helper

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2018-08-28)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenizers_collection-0.1.2.tar.gz (11.2 kB view hashes)

Uploaded Source

Built Distribution

tokenizers_collection-0.1.2-py2.py3-none-any.whl (7.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page