take · PyPI

A DSL for extracting data from a web page.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

A DSL for extracting data from a web page. The DSL serves two purposes: finds elements and extracts their text or attribute values. The main reason for developing this is to have all the CSS selectors for scraping a site in one place.

The DSL wraps PyQuery.

Example

Given the following take template:

$ h1 | text
    save: h1_title
$ ul
    save each: uls
        $ li
            | 0 [title]
                save: title
            | 1 text
                save: second_li
$ p | 1 text
    save: p_text

And the following HTML:

<div>
    <h1>Le Title 1</h1>
    <p>Some body here</p>
    <h2 class="second title">The second title</h2>
    <p>Another body here</p>
    <ul id="a">
        <li title="a less than awesome title">A first li</li>
        <li>A second li</li>
        <li>A third li</li>
    </ul>
    <ul id="b">
        <li title="some awesome title">B first li</li>
        <li>B second li</li>
        <li>B third li</li>
    </ul>
</div>

The following data will be extracted (presented in JSON format):

{
    "h1_title": "Le Title 1",
    "p_text": "Another body here",
    "uls": [
        {
            "title": "a less than awesome title",
            "second_li": "A second li"
        },
        {
            "title": "some awesome title",
            "second_li": "B second li"
        }
    ]
}

Take templates always result in a single python dict.

For a more complex example, see the reddit sample.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.0

Apr 15, 2015

0.1.0

Apr 6, 2015

0.0.6

Apr 6, 2015

0.0.5

Apr 6, 2015

0.0.4

Mar 27, 2015

0.0.3

Mar 25, 2015

0.0.2

Mar 25, 2015

0.0.1

Mar 25, 2015

This version

0.0.0

Mar 25, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

take-0.0.0.tar.gz (8.9 kB view hashes)

Uploaded Mar 25, 2015 Source

Hashes for take-0.0.0.tar.gz

Hashes for take-0.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d47d67a3d723b4e5144e45c88dbf05317ecdddc362e7cdd3325fe7e4e7cc6879`
MD5	`3a4a348fee9eb81448b8b6b3592269ec`
BLAKE2b-256	`8445f82d9aadea3033e8a9b968b6fa3a646635b03bbedf89471b3962e21ac3fa`