skip to navigation
skip to content

encutils 0.9

Encoding detection collection for Python.

===================================================
encutils - encoding detection collection for Python
===================================================
:Version: 0.9
:Author: Christof Hoeke, see http://cthedot.de/encutils/
:Contributor: Robert Siemer
:Copyright: 2005-2009: Christof Hoeke
:License: encutils has a dual-license, please choose whatever you prefer:

    * encutils is published under the
      `LGPL 3 or later `__
    * encutils is published under the
      `Creative Commons License `__.

    encutils is free software: you can redistribute it and/or modify
    it under the terms of the GNU Lesser General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    encutils is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU Lesser General Public License for more details.

    You should have received a copy of the GNU Lesser General Public License
    along with encutils.  If not, see .


A collection of helper functions to detect encodings of text files (like HTML, XHTML, XML, CSS, etc.) retrieved via HTTP, file or string.

:func:`getEncodingInfo` is probably the main function of interest which uses
other supplied functions itself and gathers all information together and
supplies an :class:`EncodingInfo` object.

example::

    >>> import encutils
    >>> info = encutils.getEncodingInfo(url='http://cthedot.de/encutils/')

    >>> print info # = str(info)
    utf-8

    >>> print repr(info) # doctest:+ELLIPSIS
    

    >>> print info.logtext
    HTTP media_type: text/html
    HTTP encoding: utf-8
    Encoding (probably): utf-8 (Mismatch: False)
    

references
    XML
        RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt)

        easier explained in
            - http://feedparser.org/docs/advanced.html
            - http://www.xml.com/pub/a/2004/07/21/dive.html

    HTML
        http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2

TODO
    - parse @charset of HTML elements?
    - check for more texttypes if only text given
File Type Py Version Uploaded on Size # downloads
encutils-0.9-py2.4.egg (md5) Python Egg 2.4 2009-04-23 15:08:41.907462 23KB 89
encutils-0.9.zip (md5) Source 2009-04-23 15:08:31.292463 30KB 133
encutils-0.9-py2.6.egg (md5) Python Egg 2.6 2009-04-23 15:09:00.151946 23KB 93
encutils-0.9-py2.5.egg (md5) Python Egg 2.5 2009-04-23 15:08:51.786757 23KB 100

Log in to rate this package.