scrapy-do

Spider Runner for Scrapy

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
- No Input/Output (Daemon)
Framework
- Scrapy
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Internet :: WWW/HTTP

Project description

https://api.travis-ci.org/ljanyst/scrapy-do.svg?branch=master

https://coveralls.io/repos/github/ljanyst/scrapy-do/badge.svg?branch=master

Scrapy Do is a daemon that provides a convenient way to run Scrapy spiders. It can either do it once - immediately; or it can run them periodically, at specified time intervals. It’s been inspired by scrapyd but written from scratch. For the time being, it only comes with a REST API. Version 0.2.0 will come with a command line client, and version 0.3.0 will have an interactive web interface.

Homepage: https://jany.st/scrapy-do.html

Documentation: https://scrapy-do.readthedocs.io/en/latest/

Quick Start

Install scrapy-do using pip:
```
$ pip install scrapy-do
```
Start the daemon in the foreground:
```
$ scrapy-do -n scrapy-do
```

Open another terminal window, download the Scrapy’s Quotesbot example and create a deployable archive:

$ git clone https://github.com/scrapy/quotesbot.git
$ cd quotesbot
$ git archive master -o quotesbot.zip --prefix=quotesbot/

Push the code to the server:

$ curl -s http://localhost:7654/push-project.json \
       -F name=quotesbot \
       -F archive=@quotesbot.zip | jq -r
{
  "status": "ok",
  "spiders": [
    "toscrape-css",
    "toscrape-xpath"
  ]
}

Schedule some jobs:

$ curl -s http://localhost:7654/schedule-job.json \
       -F project=quotesbot \
       -F spider=toscrape-css \
       -F "when=every 2 to 3 hours" | jq -r
{
  "status": "ok",
  "identifier": "04a38a03-1ce4-4077-aee1-e8275d1c20b6"
}

$ curl -s http://localhost:7654/schedule-job.json \
       -F project=quotesbot \
       -F spider=toscrape-css \
       -F when=now | jq -r
{
  "status": "ok",
  "identifier": "83d447b0-ba6e-42c5-a80f-6982b2e860cf"
}

See what’s going on:

$ curl -s "http://localhost:7654/list-jobs.json?status=ACTIVE" | jq -r
{
  "status": "ok",
  "jobs": [
    {
      "identifier": "83d447b0-ba6e-42c5-a80f-6982b2e860cf",
      "status": "RUNNING",
      "actor": "USER",
      "schedule": "now",
      "project": "quotesbot",
      "spider": "toscrape-css",
      "timestamp": "2017-12-10 22:33:14.853565",
      "duration": null
    },
    {
      "identifier": "04a38a03-1ce4-4077-aee1-e8275d1c20b6",
      "status": "SCHEDULED",
      "actor": "USER",
      "schedule": "every 2 to 3 hours",
      "project": "quotesbot",
      "spider": "toscrape-css",
      "timestamp": "2017-12-10 22:31:12.320832",
      "duration": null
    }
  ]
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
- No Input/Output (Daemon)
Framework
- Scrapy
Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Internet :: WWW/HTTP

Release history Release notifications | RSS feed

0.5.0

Dec 24, 2020

0.4.0

Jan 3, 2020

0.3.2

Apr 13, 2019

0.3.1

Apr 13, 2019

0.3.0

Feb 27, 2018

0.2.0

Jan 27, 2018

This version

0.1.0

Dec 11, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-do-0.1.0.tar.gz (15.5 kB view hashes)

Uploaded Dec 11, 2017 Source

Built Distribution

scrapy_do-0.1.0-py3-none-any.whl (20.7 kB view hashes)

Uploaded Dec 11, 2017 Python 3

Hashes for scrapy-do-0.1.0.tar.gz

Hashes for scrapy-do-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b63188f4450a304f9e78fd1899ce2053f0ca9bb631226c42f18105c27039f791`
MD5	`0e6b6e12f968af9161b414c5dd710649`
BLAKE2b-256	`4c403337dc90bc3a35fdc54c4d590375f7589dc7b88e2c495615bc7b0f342bb3`

Hashes for scrapy_do-0.1.0-py3-none-any.whl

Hashes for scrapy_do-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d90388c240648c391b1dc4d911d158d735a487f7c3c2c82d6b3d08db353afecd`
MD5	`b34748d95f5503b9a1f731e8c3b2295b`
BLAKE2b-256	`f33f0a9fdcbbde07dacaccaa13f23c311e1d58808b2637b9b89d870093f79681`