pypinyin

汉语拼音转换工具.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Utilities

Project description

将汉语转为拼音。可以用于汉字注音、排序、检索。

基于 hotoo/pinyin 开发。

Documentation: http://pypinyin.rtfd.org
GitHub: https://github.com/mozillazg/python-pinyin
License: MIT license
PyPI: https://pypi.python.org/pypi/pypinyin
Python version: 2.6, 2.7, pypy, 3.3, 3.4

特性

根据词组智能匹配最正确的拼音。
支持多音字。
简单的繁体支持。
支持多种不同拼音风格。

安装

$ pip install pypinyin

文档

详细文档请访问：http://pypinyin.rtfd.org

使用示例

>>> from pypinyin import pinyin, lazy_pinyin
>>> import pypinyin
>>> pinyin(u'中心')
[[u'zh\u014dng'], [u'x\u012bn']]
>>> pinyin(u'中心', heteronym=True)  # 启用多音字模式
[[u'zh\u014dng', u'zh\xf2ng'], [u'x\u012bn']]
>>> pinyin(u'中心', style=pypinyin.INITIALS)  # 设置拼音风格
[['zh'], ['x']]
>>> pinyin('中心', style=pypinyin.TONE2, heteronym=True)
[['zho1ng', 'zho4ng'], ['xi1n']]
>>> lazy_pinyin(u'中心')  # 不考虑多音字的情况
['zhong', 'xin']

命令行工具：

$ pypinyin 音乐
yīn yuè
$ pypinyin -h

处理不包含拼音的字符

当程序遇到不包含拼音的字符(串)时，会根据 errors 参数的值做相应的处理:

default (默认行为): 不做任何处理，原样返回:
```
lazy_pinyin(u'你好☆')
[u'ni', u'hao', u'\u2606']
```

ignore : 忽略该字符

lazy_pinyin(u'你好☆', errors='ignore')
[u'ni', u'hao']

replace : 替换为去掉 \u 的 unicode 编码:

lazy_pinyin(u'你好☆', errors='replace')
[u'ni', u'hao', u'2606']

callable 对象 : 提供一个回调函数，接受无拼音字符(串)作为参数, 支持的返回值类型: unicode 或 list ([unicode, …]) 或 None 。

可参考单元测试代码
```
lazy_pinyin(u'你好☆', errors=lambda x: u'star')
[u'ni', u'hao', u'star']
```

分词处理

内置了简单的分词功能，对字符串按是否是中文字符进行分词。
```
>> from pypinyin import lazy_pinyin
>> lazy_pinyin(u'你好abcこんにちは')
[u'ni', u'hao', u'abc\u3053\u3093\u306b\u3061\u306f']
```
如果需要处理多音字问题，推荐同时安装其他分词模块。
如果安装了 jieba 分词模块，程序会自动调用。

使用其他分词模块：

安装分词模块，比如 pip install snownlp ；

使用经过分词处理的字符串列表作参数：

>> from pypinyin import lazy_pinyin, TONE2
>> from snownlp import SnowNLP
>> hans = u'音乐123'
>> hans_seg = SnowNLP(hans).words  # 分词处理
>> hans_seg
[u'\u97f3\u4e50', u'123']
>> lazy_pinyin(hans_seg, style=TONE2)
[u'yi1n', u'yue4', u'123']

自定义拼音库

如果对结果不满意，可以通过自定义拼音库的方式修正结果：

安装了 jieba 分词模块并且支持分词的词组

>> from pypinyin import lazy_pinyin, load_phrases_dict, TONE2
>> hans = u'桔子'
>> lazy_pinyin(hans, style=TONE2)
[u'jie2', u'zi3']
>> load_phrases_dict({u'桔子': [[u'jú'], [u'zǐ']]})
>> lazy_pinyin(hans, style=TONE2)
[u'ju2', u'zi3']

未安装 jieba 分词模块 and/or 不支持分词的词组

>> from pypinyin import lazy_pinyin, load_phrases_dict, TONE2, load_single_dict
>> hans = u'还没'
>> lazy_pinyin(hans, style=TONE2)
['hua2n', 'me2i']
>>>  # 第一种自定义词组的方法
>> load_phrases_dict({u'还没': [[u'hái'], [u'méi']]})
>>> lazy_pinyin(u'还没', style=TONE2)})
['hua2n', 'me2i']
>>> lazy_pinyin([u'还没'], style=TONE2)  # 手动指定 "还没" 为一个词组
['ha2i', 'me2i']
>>>  # 第二种自定义词组的方法
>> load_single_dict({ord(u'还'): u'hái,huán'})  # 调整 "还" 字的拼音顺序
>>> lazy_pinyin(u'还没', style=TONE2)
['ha2i', 'me2i']

Changelog

0.8.1 (2015-07-04)

bugfix 重构内置的分词功能，修复“无法正确处理包含空格的字符串的问题”

0.8.0 (2015-06-27)

新增内置简单的分词功能，完善处理没有拼音的字符（如果不需要处理多音字问题, 现在可以不用安装 jieba 或其他分词模块了）:

# 之前, 安装了结巴分词模块
lazy_pinyin(u'你好abc☆☆')
[u'ni', u'hao', 'a', 'b', 'c', u'\u2606', u'\u2606']

# 现在, 无论是否安装结巴分词模块
lazy_pinyin(u'你好abc☆☆')
[u'ni', u'hao', u'abc\u2606\u2606']

[变更] 当 errors 参数是回调函数时，函数的参数由单个字符变更为单个字符或词组。

即: 对于 abc 字符串, 之前将调用三次 errors 回调函数: func('a') ... func('b') ... func('abc')

现在只调用一次: func('abc') 。

[变更] 将英文字符也纳入 errors 参数的处理范围:

# 之前
lazy_pinyin(u'abc', errors='ignore')
[u'abc']

# 现在
lazy_pinyin(u'abc', errors='ignore')
[]

0.7.0 (2015-06-20)

修复 Python 2 下无法使用 from pypinyin import * 的问题
新增支持以下环境变量:
- PYPINYIN_NO_JIEBA=true: 禁用“自动调用结巴分词模块”
- PYPINYIN_NO_PHRASES=true: 禁用内置的“词组拼音库”

0.6.0 (2015-06-10)

新增 errors 参数支持回调函数(#17):

def foobar(char):
    return 'a'
pinyin(u'あ', errors=foobar)

0.5.7 (2015-05-17)

纠正包含 “便宜” 的一些词组的读音

0.5.6 (2015-02-26)

fix “苹果” pinyin error. #11
精简 phrases_dict
fix 重复 import jieba 的问题
更新文档

0.5.5 (2015-01-27)

fix phrases_dict error

0.5.4 (2014-12-26)

修复无法正确处理由分词模块产生的中英文混合词组（比如：B超，维生素C）的问题. #8

0.5.3 (2014-12-07)

更新拼音库

0.5.2 (2014-09-21)

载入拼音库时，改为载入其副本。防止内置的拼音库被破坏
修复胜败乃兵家常事的音标问题

0.5.1 (2014-03-09)

新增参数 errors 用来控制如何处理没有拼音的字符:
- 'default': 保留原始字符
- 'ignore': 忽略该字符
- 'replace': 替换为去掉 \u 的 unicode 编码字符串(u'\u90aa' => u'90aa')
只处理 [^a-zA-Z0-9_] 字符。

0.5.0 (2014-03-01)

使用新的单字拼音库内容和格式

新的格式：{0x963F: u"ā,ē"}

旧的格式：{u'啊': u"ā,ē"}

0.4.4 (2014-01-16)

清理命令行命令的输出结果，去除无关信息
修复 “ImportError: No module named runner”

0.4.3 (2014-01-10)

修复命令行工具在 Python 3 下的兼容性问题

0.4.2 (2014-01-10)

去除拼音风格前的 STYLE_ 前缀（兼容包含 STYLE_ 前缀的拼音风格）
增加命令行工具，具体用法请见： pypinyin -h

0.4.1 (2014-01-04)

新增支持自定义拼音库，方便用户修正程序结果

0.4.0 (2014-01-03)

变更将 jieba 模块改为可选安装，用户可以选择使用自己喜爱的分词模块对汉字进行分词处理
新增支持 Python 3

0.3.1 (2013-12-24)

增加 lazy_pinyin

>>> lazy_pinyin(u'中心')
['zhong', 'xin']

0.3.0 (2013-09-26)

修复首字母风格无法正确处理只有韵母的汉字
新增三个拼音风格:
- pypinyin.STYLE_FINALS ：韵母风格1，只返回各个拼音的韵母部分，不带声调。如： ong uo
- pypinyin.STYLE_FINALS_TONE ：韵母风格2，带声调，声调在韵母第一个字母上。如： ōng uó
- pypinyin.STYLE_FINALS_TONE2 ：韵母风格2，带声调，声调在各个拼音之后，用数字 [0-4] 进行表示。如： o1ng uo2

0.2.0 (2013-09-22)

完善对中英文混合字符串的支持:

>> pypinyin.pinyin(u'你好abc')
[[u'n\u01d0'], [u'h\u01ceo'], [u'abc']]

0.1.0 (2013-09-21)

Initial Release

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Utilities

Release history Release notifications | RSS feed

0.51.0

Mar 10, 2024

0.50.0

Dec 11, 2023

0.49.0

May 14, 2023

0.48.0

Jan 15, 2023

0.47.1

Aug 21, 2022

0.47.0

Jul 30, 2022

0.46.0

Feb 12, 2022

0.45.0

Jan 23, 2022

0.44.0

Nov 14, 2021

0.43.0

Oct 6, 2021

0.42.1

Sep 30, 2021

0.42.0

Jun 14, 2021

0.41.0

Mar 13, 2021

0.40.0

Nov 22, 2020

0.39.1

Oct 8, 2020

0.39.0

Aug 16, 2020

0.38.1

Jul 5, 2020

0.38.0

Jun 7, 2020

0.37.0

Feb 9, 2020

0.36.0

Oct 27, 2019

0.35.4

Jul 13, 2019

0.35.3

May 11, 2019

0.35.2

Apr 6, 2019

0.35.1

Mar 2, 2019

0.35.0

Feb 24, 2019

0.34.1

Dec 30, 2018

0.34.0

Dec 8, 2018

0.33.2

Nov 3, 2018

0.33.1

Sep 23, 2018

0.33.0

Aug 5, 2018

0.32.0

Jul 28, 2018

0.31.0

Jun 10, 2018

0.30.1

Apr 25, 2018

0.30.0

Feb 3, 2018

0.29.0

Jan 14, 2018

0.28.0

Dec 8, 2017

0.27.0

Oct 28, 2017

0.26.1

Oct 25, 2017

0.26.0

Oct 12, 2017

0.25.0

Oct 1, 2017

0.24.0

Sep 17, 2017

0.23.0

Jul 9, 2017

0.22.0

Jun 14, 2017

0.21.1

May 29, 2017

0.21.0

May 14, 2017

0.20.0

May 13, 2017

0.19.0

May 5, 2017

0.18.2

Apr 25, 2017

0.18.1

Mar 22, 2017

0.18.0

Mar 22, 2017

0.17.0

Mar 13, 2017

0.16.1

Feb 12, 2017

0.16.0

Nov 27, 2016

0.15.0

Oct 19, 2016

0.14.0

Sep 24, 2016

0.13.0

Aug 19, 2016

0.12.1

May 11, 2016

0.12.0

Mar 12, 2016

0.11.1

Feb 17, 2016

0.11.0

Jan 16, 2016

0.10.0

Jan 2, 2016

0.9.5

Dec 19, 2015

0.9.4

Nov 27, 2015

0.9.3

Nov 15, 2015

0.9.2

Nov 15, 2015

0.9.1

Oct 17, 2015

0.9.0

Sep 20, 2015

0.8.5

Aug 23, 2015

0.8.4

Aug 23, 2015

0.8.3

Aug 20, 2015

0.8.2

Aug 20, 2015

This version

0.8.1

Jul 4, 2015

0.8.0

Jun 27, 2015

0.7.0

Jun 19, 2015

0.6.0

Jun 10, 2015

0.5.7

May 17, 2015

0.5.6

Feb 26, 2015

0.5.5

Jan 27, 2015

0.5.4

Dec 26, 2014

0.5.3

Dec 7, 2014

0.5.2

Sep 21, 2014

0.5.1

Mar 9, 2014

0.5.0

Mar 2, 2014

0.4.4

Jan 16, 2014

0.4.3

Jan 10, 2014

0.4.2

Jan 10, 2014

0.4.1

Jan 4, 2014

0.4.0

Jan 3, 2014

0.3.1

Dec 24, 2013

0.3.0

Sep 26, 2013

0.2.0

Sep 22, 2013

0.1.0

Sep 20, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypinyin-0.8.1.tar.gz (1.0 MB view hashes)

Uploaded Jul 4, 2015 Source

Built Distribution

pypinyin-0.8.1-py2.py3-none-any.whl (1.0 MB view hashes)

Uploaded Jul 4, 2015 Python 2 Python 3

Hashes for pypinyin-0.8.1.tar.gz

Hashes for pypinyin-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`1611a41d5a31285395112de4ab34587d3f9f51d3873f07094c40dd4ac8e9cb3a`
MD5	`5b37b749443e624c67d64a007c22ffc2`
BLAKE2b-256	`8d8209bf36e86662007a7093fec8d6001e972071888fd1246233c6c1cfea9c42`

Hashes for pypinyin-0.8.1-py2.py3-none-any.whl

Hashes for pypinyin-0.8.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`adaf9f444392195c8c911a5b640062275e16ed60ee856397222ccb0e29cbeaed`
MD5	`ebce2ed1199a4df8eed5e8aebdbcc074`
BLAKE2b-256	`eea1e3d8be85c178de68d6121cc79f97f39a25aa240d3d6ee664f6a2ee67e397`

pypinyin 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

特性

安装

文档

使用示例

处理不包含拼音的字符

分词处理

自定义拼音库

Related Projects

Changelog

0.8.1 (2015-07-04)

0.8.0 (2015-06-27)

0.7.0 (2015-06-20)

0.6.0 (2015-06-10)

0.5.7 (2015-05-17)

0.5.6 (2015-02-26)

0.5.5 (2015-01-27)

0.5.4 (2014-12-26)

0.5.3 (2014-12-07)

0.5.2 (2014-09-21)

0.5.1 (2014-03-09)

0.5.0 (2014-03-01)

0.4.4 (2014-01-16)

0.4.3 (2014-01-10)

0.4.2 (2014-01-10)

0.4.1 (2014-01-04)

0.4.0 (2014-01-03)

0.3.1 (2013-12-24)

0.3.0 (2013-09-26)

0.2.0 (2013-09-22)

0.1.0 (2013-09-21)

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution