wildcard.pdfpal 0.7b6
PDF Thumbnail generation, OCR indexing and extra views integrated with plone.app.async
Introduction
This package provides some nice integrations for PDF heavy web sites.
- Generates thumbnails from PDF
- Adds folder view for pdfs so it can use the generated thumbnail
- Adds OCR for PDF indexing
- Everything configurable so you can choose to not use thumbnail gen or OCR
- Ability to create searchable PDFs with HOCR
- use the @@async-monitor url to monitor asynchronous jobs that have yet to run
OCR
OCR requires Ghostscript to be installed and Tesseract. Just you package management to install these packages:
# sudo apt-get install ghostscript tesseract-ocr
This will install tessact 2 not tesseract 3.
Searchable PDFs
Requires svn checkout of tesseract version 3.01 or 3.00 with the hocr configuration in place. Take a look at this thread to find out how to configure hocr http://ubuntuforums.org/showthread.php?t=1647350
In addition, you'll need exactimage and pdftk installed
# sudo apt-get install exactimage pdftk libtiff-tools
To not use the latest tesseract version to will have to add this in your instances declaration:
environment-vars += AUTHORIZE_OLD_TESSERACT_VERSION true
Plone 3
- Requires hashlib
Extra
You can convert all at once by calling the url @@queue-up-all.
Changelog
0.7b6 ~ 2012-04-20
- -fix uninstall
- [vangheem]
0.7b5 ~ 2012-04-19
- do not run conversion if documentviewer is installed [vangheem]
- add better uninstall support [vangheem]
0.7b4 ~ 2012-04-09
- fix image url for album view. [vangheem]
0.7b3 ~ 2012-04-05
- fix content type spec for thumbnail response [vangheem]
- display image thumb urls in in album view [vangheem]
0.7b2 ~ 2011-04-12
- more checks on reading files [vangheem]
- provide button to manually index document [vangheem]
- add ability to split pdf up into multiple PDFs [vangheem]
0.7b1 ~ 2011-01-06
- fixes for quality and size issues [vangheem]
0.6b2 ~ 2011-01-04
- fix async monitor view to work with plone.app.async = 1.0 It changed the order of some args in the job. [vangheem]
0.6b1 ~ 2011-01-04
- added ability to make PDFs searchable and make it work seamlessly if wc.pageturner is installed so flex paper is created with the searchable PDF version.
0.5b5 ~ 2010-12-07
- did not conditionally import plone.app.async
0.5b4 ~ 2010-12-06
- better info on async monitor
- only reindex searchabletext when doing OCR so the modification date on the object does not get set.
- make sure to catch exceptions so it doesn't leave around files after a bad conversion
- add colorbox for pdf folder view
0.5b3 ~ 2010-12-02
- add ability to queue up all pdf files
0.5b2 - 2010-12-02
- fix async monitor view
0.5b1 - 2010-12-02
- Initial release
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| wildcard.pdfpal-0.7b6.zip (md5) | Source | 2012-04-20 | 88KB | 123 | |
- Author: Nathan Van Gheem
- Home Page: http://pypi.python.org/pypi/wildcard.pdfpal
- Keywords: pdf plone thumbnail ocr async
- License: GPL
- Categories
- Package Index Owner: vangheem
- DOAP record: wildcard.pdfpal-0.7b6.xml
