XML/HTML scraper using XPath queries.
Project description
Piculet is a module for extracting data from XML or HTML documents using XPath queries. It consists of a single source file with no dependencies other than the standard library, which makes it very easy to integrate into applications. It also provides a command line interface.
Getting started
Piculet has been tested with Python 3.5+ and compatible versions of PyPy. You can install the latest version using pip:
pip install piculet
Installing Piculet creates a script named piculet which can be used to invoke the command line interface:
$ piculet -h usage: piculet [-h] [--version] [--html] (-s SPEC | --h2x)
For example, say you want to extract some data from the file shining.html. An example specification is given in movie.json. Download both of these files and run the command:
$ cat shining.html | piculet -s movie.json
Getting help
The documentation is available on: https://piculet.tekir.org/
The source code can be obtained from: https://github.com/uyar/piculet
License
Copyright (C) 2014-2019 H. Turgut Uyar <uyar@tekir.org>
Piculet is released under the LGPL license, version 3 or later. Read the included LICENSE.txt file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for piculet-2.0.0a1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcf927dd99016aa45ac425fac3e7664aad618af949225a08a6d742c8c7efdafa |
|
MD5 | 1809399be4239df479445e2ec17b0137 |
|
BLAKE2b-256 | 602954825cb837d7123b8dbf7f6bc7b645d3ba43aab771c101666567ff57d314 |