Skip to main content

Spider Runner for Scrapy

Project description

https://api.travis-ci.org/ljanyst/scrapy-do.svg?branch=master https://coveralls.io/repos/github/ljanyst/scrapy-do/badge.svg?branch=master PyPI Version

Scrapy Do is a daemon that provides a convenient way to run Scrapy spiders. It can either do it once - immediately; or it can run them periodically, at specified time intervals. It’s been inspired by scrapyd but written from scratch. For the time being, it only comes with a REST API. Version 0.2.0 will come with a command line client, and version 0.3.0 will have an interactive web interface.

Quick Start

  • Install scrapy-do using pip:

    $ pip install scrapy-do
  • Start the daemon in the foreground:

    $ scrapy-do -n scrapy-do
  • Open another terminal window, download the Scrapy’s Quotesbot example and create a deployable archive:

    $ git clone https://github.com/scrapy/quotesbot.git
    $ cd quotesbot
    $ git archive master -o quotesbot.zip --prefix=quotesbot/
  • Push the code to the server:

    $ curl -s http://localhost:7654/push-project.json \
           -F name=quotesbot \
           -F archive=@quotesbot.zip | jq -r
    {
      "status": "ok",
      "spiders": [
        "toscrape-css",
        "toscrape-xpath"
      ]
    }
  • Schedule some jobs:

    $ curl -s http://localhost:7654/schedule-job.json \
           -F project=quotesbot \
           -F spider=toscrape-css \
           -F "when=every 2 to 3 hours" | jq -r
    {
      "status": "ok",
      "identifier": "04a38a03-1ce4-4077-aee1-e8275d1c20b6"
    }
    
    $ curl -s http://localhost:7654/schedule-job.json \
           -F project=quotesbot \
           -F spider=toscrape-css \
           -F when=now | jq -r
    {
      "status": "ok",
      "identifier": "83d447b0-ba6e-42c5-a80f-6982b2e860cf"
    }
  • See what’s going on:

    $ curl -s "http://localhost:7654/list-jobs.json?status=ACTIVE" | jq -r
    {
      "status": "ok",
      "jobs": [
        {
          "identifier": "83d447b0-ba6e-42c5-a80f-6982b2e860cf",
          "status": "RUNNING",
          "actor": "USER",
          "schedule": "now",
          "project": "quotesbot",
          "spider": "toscrape-css",
          "timestamp": "2017-12-10 22:33:14.853565",
          "duration": null
        },
        {
          "identifier": "04a38a03-1ce4-4077-aee1-e8275d1c20b6",
          "status": "SCHEDULED",
          "actor": "USER",
          "schedule": "every 2 to 3 hours",
          "project": "quotesbot",
          "spider": "toscrape-css",
          "timestamp": "2017-12-10 22:31:12.320832",
          "duration": null
        }
      ]
    }

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-do-0.1.0.tar.gz (15.5 kB view hashes)

Uploaded Source

Built Distribution

scrapy_do-0.1.0-py3-none-any.whl (20.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page