wildcard.pdfpal

PDF Thumbnail generation, OCR indexing and extra views integrated with plone.app.async

These details have been verified by PyPI

Maintainers

Joel.Kleier senner tkimnguyen vangheem wildcard

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Introduction

This package provides some nice integrations for PDF heavy web sites.

Generates thumbnails from PDF
Adds folder view for pdfs so it can use the generated thumbnail
Adds OCR for PDF indexing
Everything configurable so you can choose to not use thumbnail gen or OCR
Ability to create searchable PDFs with HOCR
use the @@async-monitor url to monitor asynchronous jobs that have yet to run

OCR

OCR requires Ghostscript to be installed and Tesseract. Just you package management to install these packages:

# sudo apt-get install ghostscript tesseract-ocr

This will install tessact 2 not tesseract 3.

Searchable PDFs

Requires svn checkout of tesseract version 3.01 or 3.00 with the hocr configuration in place. Take a look at this thread to find out how to configure hocr http://ubuntuforums.org/showthread.php?t=1647350

In addition, you’ll need exactimage and pdftk installed

# sudo apt-get install exactimage pdftk libtiff-tools

To not use the latest tesseract version to will have to add this in your instances declaration:

environment-vars += AUTHORIZE_OLD_TESSERACT_VERSION true

Plone 3

Requires hashlib

Extra

You can convert all at once by calling the url @@queue-up-all.

Changelog

0.7b6 ~ 2012-04-20

-fix uninstall: [vangheem]

0.7b5 ~ 2012-04-19

do not run conversion if documentviewer is installed [vangheem]
add better uninstall support [vangheem]

0.7b4 ~ 2012-04-09

fix image url for album view. [vangheem]

0.7b3 ~ 2012-04-05

fix content type spec for thumbnail response [vangheem]
display image thumb urls in in album view [vangheem]

0.7b2 ~ 2011-04-12

more checks on reading files [vangheem]
provide button to manually index document [vangheem]
add ability to split pdf up into multiple PDFs [vangheem]

0.7b1 ~ 2011-01-06

fixes for quality and size issues [vangheem]

0.6b2 ~ 2011-01-04

fix async monitor view to work with plone.app.async = 1.0 It changed the order of some args in the job. [vangheem]

0.6b1 ~ 2011-01-04

added ability to make PDFs searchable and make it work seamlessly if wc.pageturner is installed so flex paper is created with the searchable PDF version.

0.5b5 ~ 2010-12-07

did not conditionally import plone.app.async

0.5b4 ~ 2010-12-06

better info on async monitor
only reindex searchabletext when doing OCR so the modification date on the object does not get set.
make sure to catch exceptions so it doesn’t leave around files after a bad conversion
add colorbox for pdf folder view

0.5b3 ~ 2010-12-02

add ability to queue up all pdf files

0.5b2 - 2010-12-02

fix async monitor view

0.5b1 - 2010-12-02

Initial release

Project details

These details have been verified by PyPI

Maintainers

Joel.Kleier senner tkimnguyen vangheem wildcard

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.7b6 pre-release

Apr 20, 2012

0.7b5 pre-release

Apr 20, 2012

0.7b4 pre-release

Apr 9, 2012

0.7b3 pre-release

Apr 5, 2012

0.7b2 pre-release

Apr 12, 2011

0.7b1 pre-release

Jan 6, 2011

0.6b2 pre-release

Jan 4, 2011

0.6b1 pre-release

Jan 4, 2011

0.5b5 pre-release

Dec 7, 2010

0.5b4 pre-release

Dec 7, 2010

0.5b3 pre-release

Dec 2, 2010

0.5b2 pre-release

Dec 2, 2010

0.5b1 pre-release

Dec 2, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wildcard.pdfpal-0.7b6.zip (90.4 kB view hashes)

Uploaded Apr 20, 2012 Source

Hashes for wildcard.pdfpal-0.7b6.zip

Hashes for wildcard.pdfpal-0.7b6.zip
Algorithm	Hash digest
SHA256	`e822c767a7dd18768d328bb6f96c338aa821601f77ae62647b5e7a2802ce00ad`
MD5	`3c9429bc826357d33d7c56ebbba23d7d`
BLAKE2b-256	`1caa1a1719c6de76e310ddfa7baf50cb23fca2c89e6c6a6b2281f1abc0b2ba1e`