skip to navigation
skip to content

zhon 1.1.5

Zhon provides constants used in Chinese text processing.

Zhon is a Python library that provides constants commonly used in Chinese text processing.

About

Zhon’s constants can be used in Chinese text processing, for example:

  • Find CJK characters in a string:

    >>> re.findall('[%s]' % zhon.hanzi.characters, 'I broke a plate: 我打破了一个盘子.')
    ['我', '打', '破', '了', '一', '个', '盘', '子']
    
  • Validate Pinyin syllables, words, or sentences:

    >>> re.findall(zhon.pinyin.syllable, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
    ['Yuàn', 'zi', 'lǐ', 'tíng', 'zhe', 'yí', 'liàng', 'chē']
    
    >>> re.findall(zhon.pinyin.word, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
    ['Yuànzi', 'lǐ', 'tíngzhe', 'yí', 'liàng', 'chē']
    
    >>> re.findall(zhon.pinyin.sentence, 'Yuànzi lǐ tíngzhe yí liàng chē.', re.I)
    ['Yuànzi lǐ tíngzhe yí liàng chē.']
    

Features

  • Includes commonly-used constants:
    • CJK characters and radicals
    • Chinese punctuation marks
    • Chinese sentence regular expression pattern
    • Pinyin vowels, consonants, lowercase, uppercase, and punctuation
    • Pinyin syllable, word, and sentence regular expression patterns
    • Zhuyin characters and marks
    • Zhuyin syllable regular expression pattern
    • CC-CEDICT characters
  • Runs on Python 2.7 and 3

Getting Started

 
File Type Py Version Uploaded on Size
zhon-1.1.5.tar.gz (md5) Source 2016-05-23 97KB