piculet

XML/HTML scraper using XPath queries.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library, which makes it very easy to integrate into applications. It also provides a command line interface.

PyPI:: https://pypi.python.org/pypi/piculet/
Repository:: https://bitbucket.org/uyar/piculet
Documentation:: https://piculet.readthedocs.io/

Piculet has been tested with Python 2.7, Python 3.4+, PyPy2 5.7+, and PyPy3 5.7+. You can install the latest version using pip:

pip install piculet

History

1.0b7 (2018-03-21)

Dropped support for Python 3.3.
Fixes for handling Unicode data in HTML for Python 2.
Added registry for preprocessors.

1.0b6 (2018-01-17)

Support for writing specifications in YAML.

1.0b5 (2018-01-16)

Added a class-based API for writing specifications.
Added predefined transformation functions.
Removed callables from specification maps. Use the new API instead.
Added support for registering new reducers and transformers.
Added support for defining sections in document.
Refactored XPath evaluation method in order to parse path expressions once.
Preprocessing will be done only once when the tree is built.
Concatenation is now the default reducing operation.

1.0b4 (2018-01-02)

Added “–version” option to command line arguments.
Added option to force the use of lxml’s HTML builder.
Fixed the error where non-truthy values would be excluded from the result.
Added support for transforming node text during preprocess.
Added separate preprocessing function to API.
Renamed the “join” reducer as “concat”.
Renamed the “foreach” keyword for keys as “section”.
Removed some low level debug messages to substantially increase speed.

1.0b3 (2017-07-25)

Removed the caching feature.

1.0b2 (2017-06-16)

Added helper function for getting cache hash keys of URLs.

1.0b1 (2017-04-26)

Added optional value transformations.
Added support for custom reducer callables.
Added command-line option for scraping documents from local files.

1.0a2 (2017-04-04)

Added support for Python 2.7.
Fixed lxml support.

1.0a1 (2016-08-24)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.0.0a1 pre-release

Jul 23, 2019

2.0.0a0 pre-release

Jun 28, 2019

1.0.1

Feb 7, 2019

1.0

May 25, 2018

This version

1.0b7 pre-release

Mar 21, 2018

1.0b6 pre-release

Jan 17, 2018

1.0b5 pre-release

Jan 16, 2018

1.0b4 pre-release

Jan 2, 2018

1.0b3 pre-release

Jul 25, 2017

1.0b2 pre-release

Jun 16, 2017

1.0b1 pre-release

Apr 26, 2017

1.0a2 pre-release

Apr 4, 2017

1.0a1 pre-release

Aug 24, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piculet-1.0b7.tar.gz (32.8 kB view hashes)

Uploaded Mar 21, 2018 Source

Built Distribution

piculet-1.0b7-py2.py3-none-any.whl (13.9 kB view hashes)

Uploaded Mar 21, 2018 Python 2 Python 3

Hashes for piculet-1.0b7.tar.gz

Hashes for piculet-1.0b7.tar.gz
Algorithm	Hash digest
SHA256	`caf5024fbd8bf8ec95e52e582226d823393c013e4eba65f6cb5cc232da9e5c1e`
MD5	`79b119cc6d51e4e507b10ab44b14677e`
BLAKE2b-256	`0f6adeb3549e0882c775f58579f93da222884c6c01decfeb1f97c4c0f0827dc8`

Hashes for piculet-1.0b7-py2.py3-none-any.whl

Hashes for piculet-1.0b7-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`c14ae0bd8bf0e1ba771c8c2532fcea3259f2133c89a1f85257a52abf1d8e210e`
MD5	`08bab36fcdb7f9177513abdaeeae1fa2`
BLAKE2b-256	`f93f688365b8255c035d815887d56a68e4b21a509d37c860f32202f23f971644`