No project description provided

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

PyCrawl

Script for crawling in Python.

Description

This project enables site crawling and data extraction with xpath and css selectors. You can also send forms such as text data, files, and checkboxes.

Requirement

Python3
mechanize
lxml

Usage

Simple Example

import pycrawl

url = 'http://www.example.com/'
doc = pycrawl.PyCrawl(url)

# Search for nodes by css
doc.css('div')
doc.css('.main-text')
doc.css('#tadjs')

# Search for nodes by xpath
doc.xpath('//*[@id="top"]/div[1]')

# Others
doc.css('div').css('a')[2].attr('href')
doc.css('p').innerText()
doc.tables  # -> Table Tag to Dict

# You do not need to specify "[]" to access the first index

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.1.0

Mar 29, 2021

2.6.0

May 18, 2020

2.5.0

May 18, 2020

2.4.1

May 12, 2020

2.4.0

May 6, 2020

2.3.2

May 3, 2020

2.3.1

May 3, 2020

2.3.0

May 3, 2020

2.2.1

May 3, 2020

2.2.0

May 3, 2020

2.1.0

May 3, 2020

2.0.0

May 3, 2020

1.1.0

Jan 11, 2020

1.0.1

Jan 10, 2020

This version

1.0.0

Jan 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pycrawl-1.0.0-py3-none-any.whl (4.6 kB view hashes)

Uploaded Jan 10, 2020 Python 3

Hashes for pycrawl-1.0.0-py3-none-any.whl

Hashes for pycrawl-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6e4bcaf75ab3b405e1eed358dc5d951f080bfa79d72b63efcdc18c3065c6d04`
MD5	`5d11f144be0d6a020add6de53b124ca2`
BLAKE2b-256	`bbf0824d0cf6d59576d4692fd153bd0804bcb96d858c65fbe61e1006434db377`