Skip to main content

Unicode normalization filters

Project description

Converts UTF-8 input to the desired UTF-8 in Unicode normalization form.

Read about the Unicode Normalization Forms!

Usage

There are five executables included, that all have the exact same usage and arguments:

  • unormalize

  • nfc

  • nfd

  • nfkc

  • nfkd

You may either redirect or pipe input into unormalize (and its buddies), or provide filenames as arguments.

Options

-f FORM/--form=FORM

Selects the normalization form: one of NFC, NFD, NFKC, or NFKD. The equivalently named executables imply their respective normalization form; unormalize is equivilent to nfk without the --form arugment.

-i EXTENSION/--in-place EXTENSION

Filenames must be specified as arguments. If so, this opens them, and converts them into the desired normalization form, in place. EXTENSION is the extension given to back-ups of the original files.

Examples

Convert clipboard contents to NFC (macOS):

$ pbpaste | nfc | pbcopy

Convert a file, in-place, to NFKD:

$ nfkd --in-place=.bak file.txt && rm file.txt.bak

Convert circled, variants, and half-widths to their compatible forms:

$ echo 'ℍ①カ' | nfkc
H1カ

License

© 2015, 2017 Eddie Antonio Santos. MIT Licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unormalize-2020.7.17.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distributions

unormalize-2020.7.17-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

unormalize-2020.7.17-py2-none-any.whl (4.6 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page