Spider Runner for Scrapy
Project description
Scrapy Do is a daemon that provides a convenient way to run Scrapy spiders. It can either do it once - immediately; or it can run them periodically, at specified time intervals. It’s been inspired by scrapyd but written from scratch. For the time being, it only comes with a REST API. Version 0.2.0 will come with a command line client, and version 0.3.0 will have an interactive web interface.
Homepage: https://jany.st/scrapy-do.html
Documentation: https://scrapy-do.readthedocs.io/en/latest/
Quick Start
Install scrapy-do using pip:
$ pip install scrapy-do
Start the daemon in the foreground:
$ scrapy-do -n scrapy-do
Open another terminal window, download the Scrapy’s Quotesbot example and create a deployable archive:
$ git clone https://github.com/scrapy/quotesbot.git $ cd quotesbot $ git archive master -o quotesbot.zip --prefix=quotesbot/
Push the code to the server:
$ curl -s http://localhost:7654/push-project.json \ -F name=quotesbot \ -F archive=@quotesbot.zip | jq -r { "status": "ok", "spiders": [ "toscrape-css", "toscrape-xpath" ] }
Schedule some jobs:
$ curl -s http://localhost:7654/schedule-job.json \ -F project=quotesbot \ -F spider=toscrape-css \ -F "when=every 2 to 3 hours" | jq -r { "status": "ok", "identifier": "04a38a03-1ce4-4077-aee1-e8275d1c20b6" } $ curl -s http://localhost:7654/schedule-job.json \ -F project=quotesbot \ -F spider=toscrape-css \ -F when=now | jq -r { "status": "ok", "identifier": "83d447b0-ba6e-42c5-a80f-6982b2e860cf" }
See what’s going on:
$ curl -s "http://localhost:7654/list-jobs.json?status=ACTIVE" | jq -r { "status": "ok", "jobs": [ { "identifier": "83d447b0-ba6e-42c5-a80f-6982b2e860cf", "status": "RUNNING", "actor": "USER", "schedule": "now", "project": "quotesbot", "spider": "toscrape-css", "timestamp": "2017-12-10 22:33:14.853565", "duration": null }, { "identifier": "04a38a03-1ce4-4077-aee1-e8275d1c20b6", "status": "SCHEDULED", "actor": "USER", "schedule": "every 2 to 3 hours", "project": "quotesbot", "spider": "toscrape-css", "timestamp": "2017-12-10 22:31:12.320832", "duration": null } ] }
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrapy_do-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d90388c240648c391b1dc4d911d158d735a487f7c3c2c82d6b3d08db353afecd |
|
MD5 | b34748d95f5503b9a1f731e8c3b2295b |
|
BLAKE2b-256 | f33f0a9fdcbbde07dacaccaa13f23c311e1d58808b2637b9b89d870093f79681 |