Skip to main content

Finds differences between two PDF documents

Project description

pdf-diff

Finds differences between two PDF documents:

  1. Compares the text layers of two PDF documents and outputs the bounding boxes of changed text in JSON.
  2. Rasterizes the changed pages in the PDFs to a PNG and draws red outlines around changed text.

Example Image Output

The script is written in Python 3, and it relies on the pdftotext program.

Requirements

libxml2 >= 2.7.0, libxslt >= 1.1.23, poppler

Requirements installation for Ubuntu:

sudo apt-get install python3-lxml poppler-utils

Requirements installation for OS X:

brew install libxml2 libxslt poppler

Installation

From PyPI:

pip install pdf-diff

From source:

sudo python3 setup.py install

Running

Turn two PDFs into one large PNG image showing the differences:

pdf-diff before.pdf after.pdf > comparison_output.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf-diff-0.9.1.tar.gz (8.0 kB view hashes)

Uploaded Source

Built Distributions

pdf_diff-0.9.1-py3.5.egg (15.6 kB view hashes)

Uploaded Source

pdf_diff-0.9.1-py3-none-any.whl (11.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page