skip to navigation
skip to content

cchardet 2.1.1

cChardet is high speed universal character encoding detector.

cChardet

cChardet is high speed universal character encoding detector. - binding to uchardet.

Supported Languages/Encodings

  • International (Unicode)
    • UTF-8
    • UTF-16BE / UTF-16LE
    • UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 / X-ISO-10646-UCS-4-21431
  • Arabic
    • ISO-8859-6
    • WINDOWS-1256
  • Bulgarian
    • ISO-8859-5
    • WINDOWS-1251
  • Chinese
    • ISO-2022-CN
    • BIG5
    • EUC-TW
    • GB18030
    • HZ-GB-2312
  • Croatian:
    • ISO-8859-2
    • ISO-8859-13
    • ISO-8859-16
    • Windows-1250
    • IBM852
    • MAC-CENTRALEUROPE
  • Czech
    • Windows-1250
    • ISO-8859-2
    • IBM852
    • MAC-CENTRALEUROPE
  • Danish
    • ISO-8859-1
    • ISO-8859-15
    • WINDOWS-1252
  • English
    • ASCII
  • Esperanto
    • ISO-8859-3
  • Estonian
    • ISO-8859-4
    • ISO-8859-13
    • ISO-8859-13
    • Windows-1252
    • Windows-1257
  • Finnish
    • ISO-8859-1
    • ISO-8859-4
    • ISO-8859-9
    • ISO-8859-13
    • ISO-8859-15
    • WINDOWS-1252
  • French
    • ISO-8859-1
    • ISO-8859-15
    • WINDOWS-1252
  • German
    • ISO-8859-1
    • WINDOWS-1252
  • Greek
    • ISO-8859-7
    • WINDOWS-1253
  • Hebrew
    • ISO-8859-8
    • WINDOWS-1255
  • Hungarian:
    • ISO-8859-2
    • WINDOWS-1250
  • Irish Gaelic
    • ISO-8859-1
    • ISO-8859-9
    • ISO-8859-15
    • WINDOWS-1252
  • Italian
    • ISO-8859-1
    • ISO-8859-3
    • ISO-8859-9
    • ISO-8859-15
    • WINDOWS-1252
  • Japanese
    • ISO-2022-JP
    • SHIFT_JIS
    • EUC-JP
  • Korean
    • ISO-2022-KR
    • EUC-KR / UHC
  • Lithuanian
    • ISO-8859-4
    • ISO-8859-10
    • ISO-8859-13
  • Latvian
    • ISO-8859-4
    • ISO-8859-10
    • ISO-8859-13
  • Maltese
    • ISO-8859-3
  • Polish:
    • ISO-8859-2
    • ISO-8859-13
    • ISO-8859-16
    • Windows-1250
    • IBM852
    • MAC-CENTRALEUROPE
  • Portuguese
    • ISO-8859-1
    • ISO-8859-9
    • ISO-8859-15
    • WINDOWS-1252
  • Romanian:
    • ISO-8859-2
    • ISO-8859-16
    • Windows-1250
    • IBM852
  • Russian
    • ISO-8859-5
    • KOI8-R
    • WINDOWS-1251
    • MAC-CYRILLIC
    • IBM866
    • IBM855
  • Slovak
    • Windows-1250
    • ISO-8859-2
    • IBM852
    • MAC-CENTRALEUROPE
  • Slovene
    • ISO-8859-2
    • ISO-8859-16
    • Windows-1250
    • IBM852
    • M

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
    result = chardet.detect(msg)
    print(result)

Benchmark

$ cd src/
$ pip install chardet
$ python tests/bench.py

Results

CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz

RAM: DDR3 1600Mhz 16GB

Platform: Ubuntu 16.04 amd64

Python 2.7.13
  Request (call/s)
chardet v3.0.2 0.36
cchardet v2.0.1 1396.42
Python 3.6.1
  Request (call/s)
chardet v3.0.2 0.35
cchardet v2.0.1 1467.77

LICENSE

See COPYING file.

Contact

CHANGES

2.1.1 (2017-07-01)

  • fix that different results with different chuck sizes
  • fix that assignments to nsSMState in nsCodingStateMachine result in unspecified behavior
  • include COPYING in package

2.1.0 (2017-05-15)

2.0.1 (2017-04-25)

  • fix an issue where UTF-8 with a BOM would not be detected as UTF-8-SIG (fix #28)
  • pass NULL Byte to feed() / detect() (fix #27)

2.0.0 (2017-04-06)

  • Improve tests

2.0a4 (2017-04-05)

  • Update uchardet repo (Fix buffer overflow)

2.0a3 (2017-03-29)

  • Implement UniversalDetector (like chardet)

2.0a2 (2017-03-28)

  • Update uchardet repo (Fix memory leak)

2.0a1 (2017-03-28)

1.1.3 (2017-02-26)

  • Support AArch64

1.1.2 (2017-01-08)

  • Support Python 3.6

1.1.1 (2016-11-05)

  • Use len() function (9e61cb9e96b138b0d18e5f9e013e144202ae4067)
  • Remove detect function in _cchardet.pyx (25b581294fc0ae8f686ac9972c8549666766f695)
  • Support manylinux1 wheel

1.1.0 (2016-10-17)

  • Add Detector class
  • Improve unit tests
 
File Type Py Version Uploaded on Size
cchardet-2.1.1-cp27-cp27m-manylinux1_i686.whl (md5) Python Wheel cp27 2017-07-01 186KB
cchardet-2.1.1-cp27-cp27m-manylinux1_x86_64.whl (md5) Python Wheel cp27 2017-07-01 195KB
cchardet-2.1.1-cp27-cp27m-win32.whl (md5) Python Wheel cp27 2017-07-01 85KB
cchardet-2.1.1-cp27-cp27m-win_amd64.whl (md5) Python Wheel cp27 2017-07-01 88KB
cchardet-2.1.1-cp27-cp27mu-manylinux1_i686.whl (md5) Python Wheel cp27 2017-07-01 186KB
cchardet-2.1.1-cp27-cp27mu-manylinux1_x86_64.whl (md5) Python Wheel cp27 2017-07-01 195KB
cchardet-2.1.1-cp34-cp34m-manylinux1_i686.whl (md5) Python Wheel cp34 2017-07-01 188KB
cchardet-2.1.1-cp34-cp34m-manylinux1_x86_64.whl (md5) Python Wheel cp34 2017-07-01 197KB
cchardet-2.1.1-cp34-cp34m-win32.whl (md5) Python Wheel cp34 2017-07-01 86KB
cchardet-2.1.1-cp34-cp34m-win_amd64.whl (md5) Python Wheel cp34 2017-07-01 88KB
cchardet-2.1.1-cp35-cp35m-manylinux1_i686.whl (md5) Python Wheel cp35 2017-07-01 188KB
cchardet-2.1.1-cp35-cp35m-manylinux1_x86_64.whl (md5) Python Wheel cp35 2017-07-01 197KB
cchardet-2.1.1-cp35-cp35m-win32.whl (md5) Python Wheel cp35 2017-07-01 88KB
cchardet-2.1.1-cp35-cp35m-win_amd64.whl (md5) Python Wheel cp35 2017-07-01 91KB
cchardet-2.1.1-cp36-cp36m-manylinux1_i686.whl (md5) Python Wheel cp36 2017-07-01 188KB
cchardet-2.1.1-cp36-cp36m-manylinux1_x86_64.whl (md5) Python Wheel cp36 2017-07-01 197KB
cchardet-2.1.1-cp36-cp36m-win32.whl (md5) Python Wheel cp36 2017-07-01 88KB
cchardet-2.1.1-cp36-cp36m-win_amd64.whl (md5) Python Wheel cp36 2017-07-01 91KB
cchardet-2.1.1.tar.gz (md5) Source 2017-07-01 630KB