encutils 0.9
Encoding detection collection for Python.
===================================================
encutils - encoding detection collection for Python
===================================================
:Version: 0.9
:Author: Christof Hoeke, see http://cthedot.de/encutils/
:Contributor: Robert Siemer
:Copyright: 2005-2009: Christof Hoeke
:License: encutils has a dual-license, please choose whatever you prefer:
* encutils is published under the
`LGPL 3 or later `__
* encutils is published under the
`Creative Commons License `__.
encutils is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
encutils is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with encutils. If not, see .
A collection of helper functions to detect encodings of text files (like HTML, XHTML, XML, CSS, etc.) retrieved via HTTP, file or string.
:func:`getEncodingInfo` is probably the main function of interest which uses
other supplied functions itself and gathers all information together and
supplies an :class:`EncodingInfo` object.
example::
>>> import encutils
>>> info = encutils.getEncodingInfo(url='http://cthedot.de/encutils/')
>>> print info # = str(info)
utf-8
>>> print repr(info) # doctest:+ELLIPSIS
>>> print info.logtext
HTTP media_type: text/html
HTTP encoding: utf-8
Encoding (probably): utf-8 (Mismatch: False)
references
XML
RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt)
easier explained in
- http://feedparser.org/docs/advanced.html
- http://www.xml.com/pub/a/2004/07/21/dive.html
HTML
http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
TODO
- parse @charset of HTML elements?
- check for more texttypes if only text given
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| encutils-0.9-py2.4.egg (md5) | Python Egg | 2.4 | 2009-04-23 | 23KB | 530 |
| encutils-0.9-py2.5.egg (md5) | Python Egg | 2.5 | 2009-04-23 | 23KB | 657 |
| encutils-0.9-py2.6.egg (md5) | Python Egg | 2.6 | 2009-04-23 | 23KB | 1137 |
| encutils-0.9.zip (md5) | Source | 2009-04-23 | 30KB | 1924 | |
- Author: Christof Hoeke
- Home Page: http://cthedot.de/encutils/
- Download URL: http://cthedot.de/encutils/
- Keywords: encoding,i18n,xml,html,css
-
License:
encutils has a dual-license, please choose whatever you prefer: * encutils is published under the `LGPL 3 or later <http://cthedot.de/encutils/license/>`__ * encutils is published under the `Creative Commons License <http://creativecommons.org/licenses/by/3.0/>`__. - Platform: Python 2.3 and later.
-
Categories
- Development Status :: 4 - Beta
- Environment :: Web Environment
- Intended Audience :: Developers
- License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
- License :: Other/Proprietary License
- Operating System :: OS Independent
- Programming Language :: Python
- Topic :: Internet
- Topic :: Internet :: WWW/HTTP
- Topic :: Software Development :: Internationalization
- Topic :: Software Development :: Libraries :: Python Modules
- Topic :: Text Processing :: Markup :: HTML
- Topic :: Text Processing :: Markup :: XML
- Package Index Owner: christof
- DOAP record: encutils-0.9.xml
