skip to navigation
skip to content

pytidylib 0.3.2

Python wrapper for HTML Tidy (tidylib) on Python 2 and 3

Package Documentation

`PyTidyLib`_ is a Python package that wraps the `HTML Tidy`_ library. This
allows you, from Python code, to "fix" invalid (X)HTML markup. Some of the
library's many capabilities include:

* Clean up unclosed tags and unescaped characters such as ampersands
* Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
* Convert named entities to numeric entities, which can then be used in XML
documents without an HTML doctype.
* Clean up HTML from programs such as Word (to an extent)
* Indent the output, including proper (i.e. no) indenting for ``pre`` elements,
which some (X)HTML indenting code overlooks.

Changes
=======

* 0.3.2: Initialization bug fix

* 0.3.1: find_library support while still allowing a list of library names

* 0.3.0: Refactored to use Tidy and PersistentTidy classes while keeping the
functional interface (which will lazily create a global Tidy() object) for
backward compatibility. You can now pass a list of library names and base
options when instantiating Tidy. The keep_doc argument is now deprecated
and does nothing; use PersistentTidy.

* 0.2.4: Bugfix for a strange memory allocation corner case in Tidy.

* 0.2.3: Python 3 support (2 + 3 cross compatible) with passing Tox tests.

Small example of use
====================

The following code cleans up an invalid HTML document and sets an option::

from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
options={'numeric-entities':1})
print document
print errors

Docs
====

Documentation is shipped with the source distribution and is available at
the `PyTidyLib`_ web page.

.. _`HTML Tidy`: http://tidy.sourceforge.net/
.. _`PyTidyLib`: http://countergram.com/open-source/pytidylib/  
File Type Py Version Uploaded on Size
pytidylib-0.3.2.tar.gz (md5) Source 2016-11-16 85KB