pdfminer 20091024
PDF parser and analyzer
PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| pdfminer-20091024.tar.gz (md5) | Source | 2009-10-26 16:25:36.056670 | 1MB | 47 | |
- Author: Yusuke Shinyama <yusuke at cs dot nyu dot edu>
- Home Page: http://www.unixuser.org/~euske/python/pdfminer/index.html
- Keywords: pdf parser,pdf converter,text mining
- License: MIT/X
- Categories
- Package Index Owner: euske
- DOAP record: pdfminer-20091024.xml
Log in to rate this package.
Package rating (1 vote):
4.0
- 4 points: 1 vote
Ratings range from 0 to 5 (best).
Package Comments:
- I found the package very helpful. It is the best open-source solution I have found for processing pdfs!
The claim "Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, ..." is true and very important to many applications.
It misses ligatures (e.g. "fi" -> "?") and sometimes can't process a pdf at all. Also, the layout analysis could use much work. (timv, 2009-11-17,points)
