pdfminer3k 1.2.4
PDF parser and analyzer
pdfminer3k is a Python 3 port of pdfminer. PDFMiner is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text data. PDFMiner allows to obtain the exact location of texts in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes instead of text analysis.
Changes
Version 1.2.4 -- 2011/10/07
- When xref tables are corrupt, parse and cache all objects as a fallback.
- Fixed a bogus assertion in layouts.
Version 1.2.3 -- 2011/09/05
- Fixed a crash on uneven cmap codes.
- Fixed a meta-crash caused by bad PSParser repr.
Version 1.2.2 -- 2011/08/30
- Fixed crash on corrupt LZW data.
- Ignore lines with no text for textlines grouping.
- Don't crash on invalid dictionary constructs when parsing postscript.
Version 1.2.1 -- 2011/08/22
- Fixed a crash on corrupted inline images.
- Tweaked layout detection algo.
Version 1.2.0 -- 2011/08/09
- There wasn't a changelog until now. Starting it.
- Removed the old Postscript lexer and replaced it by a PLY-based one.
- Added a couple of heuristic layout features.
- Fixed a couple of crashes on opening PDFs.
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| pdfminer3k-1.2.4.tar.gz (md5) | Source | 2011-10-07 | 9MB | 375 | |
- Author: Virgil Dupras
- Home Page: http://bitbucket.org/hsoft/pdfminer3k
- Keywords: pdf parser,pdf converter,layout analysis,text mining
- License: MIT/X
- Categories
- Package Index Owner: hsoft
- DOAP record: pdfminer3k-1.2.4.xml
