Skip to main content

Universal encoding detector. This library is faster than chardet.

Project description

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

Support codecs

  • Big5

  • EUC-JP

  • EUC-KR

  • GB18030

  • HZ-GB-2312

  • IBM855

  • IBM866

  • ISO-2022-CN

  • ISO-2022-JP

  • ISO-2022-KR

  • ISO-8859-2

  • ISO-8859-5

  • ISO-8859-7

  • ISO-8859-8

  • KOI8-R

  • Shift_JIS

  • TIS-620

  • UTF-8

  • UTF-16BE

  • UTF-16LE

  • UTF-32BE

  • UTF-32LE

  • WINDOWS-1250

  • WINDOWS-1251

  • WINDOWS-1252

  • WINDOWS-1253

  • WINDOWS-1255

  • EUC-TW

  • X-ISO-10646-UCS-4-2143

  • X-ISO-10646-UCS-4-3412

  • x-mac-cyrillic

Requires

e.g.) Ubuntu 12.04

$ sudo apt-get install build-essential python-dev cython

Installation

$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
$ sudo python setup.py install

or

$ sudo easy_install cchardet

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
result = chardet.detect(msg)
print(result)

Test

$ sudo easy_install or pip install -U chardet nose
$ cd test
$ nosetests --nocapture tests.py

Benchmark

code: tests.TestCchardetSpeed

sample: test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt

Performance:

CPU: Intel Core i7 860 2.8GHz

RAM: DDR3-1333 16GB

Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit

Result:

chardet:        0.32 (call/s)

cchardet:       975.32 (call/s)

License

  • The MIT License: src/cchardet

  • Other Libraries License: Please, look at the src/ext directory.

Thanks

Contact

My blog

Issues

Sorry for my poor English :)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cchardet-0.3.5.tar.gz (619.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page