Client interface for Scrapinghub API
Project description
Requirements
Python 2.6 or above
Requests library
Usage
The scrapinghub module is a Python library for communicating with the Scrapinghub API.
First, you connect to Scrapinghub:
>>> from scrapinghub import Connection >>> conn = Connection('APIKEY') >>> conn Connection('APIKEY')
You can list the projects available to your account:
>>> conn.project_ids() [123, 456]
And select a particular project to work with:
>>> project = conn[123] >>> project Project(Connection('APIKEY'), 123) >>> project.id 123
To schedule a spider run (it returns the job id):
>>> project.schedule('myspider', arg1='val1') u'123/1/1'
To get the list of spiders in the project:
>>> project.spiders() [ {u'id': u'spider1', u'tags': [], u'type': u'manual', u'version': u'123'}, {u'id': u'spider2', u'tags': [], u'type': u'manual', u'version': u'123'} ]
To get all finished jobs:
>>> jobs = project.jobs(state='finished')
jobs is a JobSet. JobSet objects are iterable and, when iterated, return an iterable of Job objects, so you typically use it like this:
>>> for job in jobs: ... # do something with job
Or, if you just want to get the job ids:
>>> [x.id for x in jobs] [u'123/1/1', u'123/1/2', u'123/1/3']
To select a specific job:
>>> job = project.job(u'123/1/2') >>> job.id u'123/1/2'
To retrieve all scraped items from a job:
>>> for item in job.items(): ... # do something with item (it's just a dict)
To retrieve all log entries from a job:
>>> for logitem in job.log(): ... # logitem is a dict with logLevel, message, time
To get job info:
>>> job.info['spider'] 'myspider' >>> job.info['started_time'] '2010-09-28T15:09:57.629000' >>> job.info['tags'] [] >>> job.info['fields_count]['description'] 1253
To mark a job with tag consumed:
>>> job.update(add_tag='consumed')
To mark several jobs with tag consumed (JobSet also supports the update() method):
>>> project.jobs(state='finished').update(add_tag='consumed')
To delete a job:
>>> job.delete()
To delete several jobs (JobSet also supports the update() method):
>>> project.jobs(state='finished').delete()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapinghub-1.8.0-py2-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ad32eb392fac3398d123a6d8171a7fca83b9999a956c6d3ae6e08bdd322e38d6 |
|
MD5 | 732a845d1e07743d2b799402ff844398 |
|
BLAKE2b-256 | 637d57d016ed866483315c24750b5a6762917c777e10cdabb4b3e4b283467f28 |