No project description provided
Project description
PyCrawl
Script for crawling in Python.
Description
This project enables site crawling and data extraction with xpath and css selectors. You can also send forms such as text data, files, and checkboxes.
Requirement
- Python3
- mechanize
- lxml
Usage
Simple Example
import pycrawl
url = 'http://www.example.com/'
doc = pycrawl.PyCrawl(url)
# Search for nodes by css
doc.css('div')
doc.css('.main-text')
doc.css('#tadjs')
# Search for nodes by xpath
doc.xpath('//*[@id="top"]/div[1]')
# Others
doc.css('div').css('a')[2].attr('href')
doc.css('p').innerText()
doc.tables # -> Table Tag to Dict
# You do not need to specify "[]" to access the first index
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.