Skip to main content

A Plone 4 product that generates image thumbnail previews of PDF files stored on ATFile based objects.

Project description

https://travis-ci.org/collective/collective.pdfpeek.png?branch=master https://coveralls.io/repos/collective/collective.pdfpeek/badge.png

Introduction

PdfPeek is a Plone 4 add-on product that utilizes GNU Ghostscript to generate image thumbnail previews of PDF files uploaded to ATFile based content objects. Dexterity (and plone.app.contenttypes) support was added in 2.0.0

  • This product, when installed in a Plone 4.x site, will automatically generate preview and thumbnail images of each page of uploaded PDF files and store them annotated onto the content object containing the PDF file.

  • Image generation from the PDF file is processed asynchronously so that the user does not have to wait for the images to be created in order to continue using the site, as the processing of large PDF files can take many minutes to complete.

    Since 2.0.0 pdfpeek supports rabbitmq message queuing to generate thumbnails, see Installation section for more details

  • When a file object is initialized or edited, PdfPeek checks to see if a PDF file was uploaded. If so, a ghostscript image conversion job is added to the pdfpeek job queue (or rabbitmq in case of collective.zamqp usage).

  • If the file uploaded is not of content type ‘application/pdf’, an image removal job is added to the pdfpeek job queue. This job queue is processed periodically by a cron job or a zope clock server process. The image conversion jobs add the IPDF interface to the content object and store the resulting image preview and thumbnail for each page of the PDF annotated on to the content object itself. The image removal jobs remove the image annotations and the IPDF interface from the content object.

  • If a job fails, it is removed from the processing queue and appended to a list of failed jobs. If a job succeeds, it is removed from the processing queue and appended to a list of successfully completed jobs.

Viewlet

PdfPeek ships with an example user interface that is turned on by default. This UI displays the thumbnail images of each page of the PDF file when a user views the content object in their browser. This example UI is not quite working yet, and is meant to be just that, an example. I don’t claim to be a javascript master.

A custom traverser is available to make it easy to access the images and previews directly, as well as to build custom views incorporating image previews of file content.

Installation

Use zc.buildout to install. If you want asynchronous queue processing using collective.zamqp you may want to add collective.pdfpeek [zamqp]. Use

  • collective.pdfpeek [dexterity] for dexterity support

  • collective.pdfpeek [archetype] for archetype support

  • collective.pdfpeek [zamqp] for collective.zamqp support

You can also combine those extras as shown below (see buildout-zamqp.cfg for a working buildout configuration):

[buildout]
...
parts =
    instance

[instance]
recipe = plone.recipe.zope2instance
user = admin:admin
http-address = 8080
eggs =
    ...
    collective.pdfpeek [dexterity, zamqp]

zope-conf-additional =
    %import collective.zamqp
    <amqp-broker-connection>
        connection_id   superuser
        hostname        127.0.0.1
        port            5672
        username        guest
        password        guest
        heartbeat       120
        keepalive       60
    </amqp-broker-connection>

Configuration

PdfPeek ships with a configlet that allows the site administrator to adjust the size of the generated preview and thumbnail images, as well as toggle the example user interface and default event handlers on and off.

Requirements

  • Plone 4.1+

  • Requires the GNU ghostscript gs binary to be available on the $PATH!

  • Tested on POSIX compliant systems such as LINUX and MacOS 10.8.

  • Untested on Windows systems. (Wouldn’t be surprised if it works, as long as you can install gs.)*

  • As of version 0.17, Plone 3.x is no longer officially supported.

Code, Issues, Comments

TODO

  • Implement a processing queue model that can process files asyncronously, more than one at a time (Should be work using RabbitMQ, but will cause conflict errors).

  • Implement control panel for adding and removing image previews on file objects containing PDF files.

Installation

Via zc.buildout

The recommended method of using collective.pdfpeek is by installing via zc.buildout using the plone.recipe.zope2instance recipe. PdfPeek uses z3c.autoinclude to load it’s zcml, so you don’t need a zcml slug.

Add collective.pdfpeek to the list of eggs in the instance section of your buildout.cfg like so:

[instance]
...
eggs =
    ...
    collective.pdfpeek
    ...

Then re-run your buildout like so to activate the installation:

$ bin/buildout

Via setuptools

To install collective.pdfpeek into the global Python environment (or a virtualenv), using a traditional Zope 2 instance, you can do this:

  • When you’re reading this you have probably already run easy_install collective.pdfpeek. Find out how to install setuptools (and EasyInstall) here: http://peak.telecommunity.com/DevCenter/EasyInstall

  • If you are using Zope 2.9 (not 2.10), get pythonproducts and install it via:

    python setup.py install --home /path/to/instance

into your Zope instance.

  • Create a file called collective.pdfpeek-configure.zcml in the /path/to/instance/etc/package-includes directory. The file should only contain this:

    <include package="collective.pdfpeek" />

Configuration

Via zc.buildout

For automatic processing of the PdfPeek job queue, a simple cron script using curl or wget would suffice. It is nice to keep all of the configuration for a project in your buildout, however. For this reason, a zope clock server process is the recommended way to automatically process the job queue. You can do so by adding the following snippet to your [instance] part in your buildout configuration:

[instance]
...
zope-conf-additional=
    # process the job queue every 5 seconds
    <clock-server>
       method /Plone/@@pdfpeek.utils/process_conversion_queue
       period 5
       user admin
       password admin
       host localhost
    </clock-server>
...

You will have to edit the above snippet to customize the name of the plone site, the admin username and password, and the hostname the instance is running on. You can also adjust the interval at which the queue is processed by the clock server.

Then re-run your buildout like so to activate the clock server:

$ bin/buildout

Via cron

Install wget.

Edit your crontab file and append the following line:

5 * * * * wget --user=admin --password=admin http://localhost:8080/Plone/@@pdfpeek.utils/process_conversion_queue

You will have to customize the above line with the hostname, port number, username, password and path to your plone instance.

Save your crontab file and wget will now call the view method that triggers the processing of the pdf conversion queue every five minutes.

Via RabbitMQ

Install rabbitmq-server on your machine. There are very good documentations on the rabbitmq website, see: http://www.rabbitmq.com/download.html

Instead of configuring a clockserver, you should configure collective.zamqp to work, see following example:

[buildout]
parts =
    instance
    worker

...

[instance]
recipe = plone.recipe.zope2instance
http-address = 8080
eggs =
    ...
    collective.pdfpeek [zamqp]

...
zope-conf-additional =
    %import collective.zamqp
    <amqp-broker-connection>
        connection_id   superuser
        hostname        my.rabbithostname.com
        port            5672
        username        guest
        password        guest
        heartbeat       120
        keepalive       60
    </amqp-broker-connection>

[worker]
<= instance
http-address = 8081
zserver-threads = 1
environment-vars =
    ZAMQP_LOGLEVEL INFO
zope-conf-additional =
    ${instance:zope-conf-additional}
    <amqp-consuming-server>
        connection_id   superuser
        site_id         Plone
        user_id         admin
    </amqp-consuming-server>

For advanced configuration see collective.zamqp documentation here: https://pypi.python.org/pypi/collective.zamqp

Changelog

2.0.0 (2014-12-04)

  • Update README.rst to include a configuration example [saily]

  • Fix failing tests by including metadata into annotation storage of processed files. Test updates. [saily]

  • Use abc.ABCMeta as metaclass for abstract base class. [saily]

  • Fix dependencies and don’t include collective.zamqp into tests to allow test of default event handlers. [saily]

  • Updated events and added subscriber for IObjectCreatedEvent. [agitator]

  • Drop support for Plone 4.1, Fix test setup with plone.app.contenttypes. [saily]

  • Flake8, PEP8 cleanup, remove double quotes, PEP3101, jshint, jscs and csslint checks using plone.recipe.codeanalysis. This is also done on travis. [saily]

  • Update buildout and travis config. [saily]

  • Update bootstrap.py for buildout 2.x. [saily]

2.0b2 (2013-10-17)

  • Fix missing README.rst in package. [saily]

2.0b1 (2013-10-17)

  • Add a basic behavior to allow users to create PDF thumbnails for their own dexterity content types. [saily]

  • Add collective.zamqp integration to allow queuing PDF thumbnail jobs into RabbitMQ message queuing server. [saily]

  • Switch to PyPDF2 which is maintained compared to pyPdf and can be used as a drop-in replacement. [saily]

  • Add travis-ci for Plone 4.1, Plone 4.2 and Plone 4.3. [saily]

  • Use plone.app.testing and layers for tests. Add more tests for dexterity and ATContentTypes. [saily]

  • Huge refactoring to replace transformers and functions with more flexible adapters. [saily]

  • Plone 4.3 compatibility by removing deprecated imports from zope.app.component. [saily]

  • Add a new .gitignore file. [saily]

  • Add egg-contained buildout. Rename *.txt to *.rst to support github markup directly. [saily]

  • Dexterity types integration with field retrieval using IPrimaryFieldInfo adapter. This brings full functionality for plone.app.contenttypes. [saily]

  • Updated docs. [saily]

1.3 (2011-05-31)

  • Switched to PNG from JPEG. [dbrenneman]

1.2 (2010-12-7)

  • Fixed issue where local utilities would clash if pdfpeek was installed on multiple Plone instances within the same zope. [dbrenneman]

  • Fixed uninstall profile so that local persistent utilities are removed and image annotations are removed on uninstall of product. [dbrenneman]

1.0 (2010-5-27)

  • Fixed jQuery UI. [reedobrien]

0.19 (2010-4-8)

  • Modified transform to use cStringIO instead of StringIO, in the hopes of making things more efficient. [dbrenneman]

  • Modified conversion function to grab file data from object using getFile method, as this is the proper way of doing things… [dbrenneman]

0.18 (2010-2-26)

  • Fixed bug in reST rendering of changelog. [dbrenneman]

0.17 (2010-2-26)

  • Added wide variety of pdf files to run through the unit tests for the ghostscript image transform. [dbrenneman]

  • Added unit tests for low level ghostscript transform. [dbrenneman]

  • Refactored transform code to make class and method names make more sense. [dbrenneman]

  • Updated README, including instructions for configuring the clock server. [dbrenneman]

  • Added asyncronous processing queue for ghostscript transform jobs. [dbrenneman]

  • Updated functional doctests to work on Plone 4 with blobfile storage. [dbrenneman]

  • Updated functional doctests to test transform queue. [dbrenneman]

  • Updated documentation. [dbrenneman]

  • Added unit testing harness. [dbrenneman]

0.16 (2009-12-12)

  • Bugfix release. [dbrenneman]

0.15 (2009-12-12)

  • Added configurable preview and thumbnail sizes. [claytron]

  • reST police! Fixing up the docs so that they might get rendered correctly. [claytron]

0.13 (2009-11-12)

  • Refactored transform code to deal with encrypted pdf files better. [dbrenneman]

  • Made transform code more robust. [dbrenneman]

  • Added ability to toggle default event handler on and off. [dbrenneman]

0.12 (2009-10-25)

  • Bugfix release. [dbrenneman]

0.11 (2009-10-25)

  • Bugfix release. [dbrenneman]

0.10 (2009-10-25)

  • Added code to check for EOF at the end of the pdf file data string and to insert one if it is not there. Fixes many corrupt pdf files. [dbrenneman]

0.9 (2009-10-13)

  • Fixed another bug in the transform code to allow functioning with any filefield, as long as it is called file. [dbrenneman]

0.8 (2009-10-13)

  • Fixed a bug in the transform code to allow functioning with any filefield, as long as it is called file. [dbrenneman]

0.7 (2009-10-13)

  • Streamlined transform code. [dbrenneman]

  • Added ability to toggle the pdfpeek viewlet display on and off via configlet. [dbrenneman]

0.6 (2009-10-05)

  • Bugfix release. [dbrenneman]

0.5 (2009-10-05)

  • Added control panel configlet. [dbrenneman]

  • Removed unneeded xml files from uninstall profile. [dbrenneman]

  • Optimized transform. [dbrenneman]

  • Added storage of image thumbnail along with image, generated with PIL. [dbrenneman]

  • Changed annotation to store images in a dict instead of a list. [dbrenneman]

  • Changed event handler to listen on all AT based objects instead of ATFile. [dbrenneman]

  • Added custom pdfpeek icon for configlet. [dbrenneman]

  • Added custom traverser to allow easy access to the OFS.Image.Image() objects stored on IPDF objects. [dbrenneman]

  • Modified pdfpeek viewlet code to display images using the custom traverser. [dbrenneman]

  • Added custom scrollable gallery with tooltips using jQuery Tools to the pdfpeek viewlet for display. [dbrenneman]

0.4 (2009-10-01)

  • Refactored storage to use OFS.Image.Image() objects instead of storing the raw binary data in string format. [dbrenneman]

  • Refactored event handler object variable name. [dbrenneman]

  • Removed unneeded files from default GS Ext. profile. [dbrenneman]

  • Removed unneeded javascript files and associated images and css. [dbrenneman]

0.3 - 2009-08-03

  • fixed parsing of pdf files with multiple pages [piv]

0.1 - Unreleased

  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collective.pdfpeek-2.0.0.zip (671.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page