skip to navigation
skip to content

pdfminer 20091024

PDF parser and analyzer

PDFMiner is a suite of programs that help extracting and analyzing text data of PDF documents. Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, as well as other extra information such as font information or ruled lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.

File Type Py Version Uploaded on Size # downloads
pdfminer-20091024.tar.gz (md5) Source 2009-10-26 16:25:36.056670 1MB 47

Log in to rate this package.

Package rating (1 vote): 4.0
  • 4 points: 1 vote

Ratings range from 0 to 5 (best).

Package Comments:
  • I found the package very helpful. It is the best open-source solution I have found for processing pdfs!

    The claim "Unlike other PDF-related tools, it allows to obtain the exact location of texts in a page, ..." is true and very important to many applications.

    It misses ligatures (e.g. "fi" -> "?") and sometimes can't process a pdf at all. Also, the layout analysis could use much work. (timv, 2009-11-17, points)