Extract difference between two html pages
Project description
This package allows you to extract a difference between two html pages: given pages A and B, it will try to extract parts of A that are changed in B. It uses lxml.html.diff under the hood. but provides only changed parts as HTML.
It requires Python 3 currently.
License is MIT.
Installaton
You can install the package from PyPI:
pip install extract-html-diff
Usage
You can extract diff as text:
import extract_html_diff html = '<div> <h1>My site</h1> <div>My content</div> </div>' other_html = '<div> <h1>My site</h1> <div>Other content</div> </div>' extract_html_diff.as_string(html, other_html)
this will give you:
'<div><div>My content</div> </div>'
You can also get diff as a tree (an lxml.html.HtmlElement) if you plan to do additional transformations or change serialization:
extract_html_diff.as_tree(html, other_html)
You can pass input html as str or bytes (it will be parsed with lxml.html.fromstring in this case), or as an already parsed lxml.html.HtmlElement.