unicode-charnames

Look up Unicode character name or code point label and search in Unicode character names

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Unicode characters have names that serve as unique identifiers for each character. The character names in the Unicode Standard are identical to those of ISO/IEC 10646.

The unicode-charnames package performs searches for Unicode character names or code point labels by Unicode character, and searches for Unicode code points by character names. It also performs substring searches in Unicode character names. This package supports version 12.1 of the Unicode Standard (137,929 characters).

The generic term “character name” refers to the Unicode character “Name” property value for an encoded Unicode character. For code points that do not have character names (unassigned, reserved code points and other special code point types), the Unicode Standard uses constructed Unicode code point labels, displayed between angle brackets, to stand in for character names.

Features

The library provides:

A function to get the character name (the normative character property “Name”) or the code point label (for characters that do not have character names) of a single Unicode character.
A function to get the code point value (in the usual 4- to 6-digit hexadecimal format) corresponding to a Unicode character name; the search is case-sensitive and requires exact string match.
A function to search characters by character name; the search is case-insensitive but requires exact substring match.

Example usage:

# -*- coding: utf-8 -*-

from unicode_charnames import (
    charname,
    codepoint,
    search_charnames
)

# charname()
print('charname():\n')
print(charname('龠'))
print(charname('\U0001F60A'))
print(charname('\u00E5'))
print(charname('\u0002'))

# codepoint()
print('\ncodepoint():\n')
print(codepoint('LATIN CAPITAL LETTER E WITH ACUTE'))
print(codepoint('SUPERCALIFRAGILISTICEXPIALIDOCIOUS'))
print(codepoint('SQUARE ERA NAME REIWA'))

# search_charnames()
print('\nsearch_charnames():\n')
for x in search_charnames('era name'):
    print('\t'.join(x))

Will produce the following output:

charname():

CJK UNIFIED IDEOGRAPH-9FA0
SMILING FACE WITH SMILING EYES
LATIN SMALL LETTER A WITH RING ABOVE
<control-0002>

codepoint():

00C9
None
32FF

search_charnames():

32FF        SQUARE ERA NAME REIWA
337B        SQUARE ERA NAME HEISEI
337C        SQUARE ERA NAME SYOUWA
337D        SQUARE ERA NAME TAISYOU
337E        SQUARE ERA NAME MEIZI

References

License

unicode-charnames is released under an MIT license. The full text of the license is available here.

The Unicode Standard v12.1.0 DerivedName.txt file is licensed under the Unicode License Agreement for Data Files and Software. Please consult the UNICODE, INC. LICENSE AGREEMENT prior to use.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

15.1.0

Nov 11, 2023

15.0.0

Oct 1, 2022

14.0.0

Oct 9, 2021

13.0.0

Apr 12, 2020

13.0.0rc1 pre-release

Apr 12, 2020

This version

12.1.0.post1

Aug 29, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unicode_charnames-12.1.0.post1.tar.gz (264.5 kB view hashes)

Uploaded Aug 29, 2019 Source

Hashes for unicode_charnames-12.1.0.post1.tar.gz

Hashes for unicode_charnames-12.1.0.post1.tar.gz
Algorithm	Hash digest
SHA256	`2643b5a5bcb8b5f07187a437411e4f0d8ac7ab53069b51b535b266ef53c4599f`
MD5	`e8c82f7334b16b54158f2818103dee67`
BLAKE2b-256	`9cc2760b269ed4a13b07f49da8cdaf3d7a4f9e2dbb7903ef17ca0ecf16b0708e`