skip to navigation
skip to content

Not Logged In

scrapyrwiki 0.2

A collection of helpers for running Scrapy in ScraperWiki

A collection of helpers for running scrapers built with Scrapy in ScraperWiki

Launch scraper without scrapy CLI

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def main():
    run_spider(MySpider(), settings)

if __name__ == '__main__':
    main()

Save produced data to ScraperWiki

Just add "scrapyrwiki.pipelines.ScraperWikiPipeline" to ITEM_PIPELINES

Example:

from scrapy.conf import settings
from scrapyrwiki import run_spider

def scraperwiki():
    options = {
        'SW_SAVE_BUFFER': 5,
        'SW_UNIQUE_KEYS': {"MyItem": ['url']},
        'ITEM_PIPELINES': ['scrapyrwiki.pipelines.ScraperWikiPipeline'],
    }
    settings.overrides.update(options)
    run_spider(MySpider(), settings)


if __name__ == 'scraper':
    scraperwiki()

Check spider contracts in CI

Just launch spider with run_tests

Example:

from scrapyrwiki import run_tests
from scrapy.conf import settings

run_tests(MySpider(), "output.xml", settings)

Note: For testing the HTTP cache is used. In the directory where the script is launched there must be a scrapy.cfg (needed by Scrapy to identify that's a scraper directory) and a .scrapy directory with the HTTP cache db.

The output is in XUnit format, tested on Jenkins

Log scraper errors to Sentry

Install scrapy-sentry and set the environment variable SENTRY_DSN with the Sentry key. Scrapyrwiki will handle everything for you.

 
File Type Py Version Uploaded on Size
scrapyrwiki-0.2.tar.gz (md5) Source 2013-02-27 3KB
  • Downloads (All Versions):
  • 9 downloads in the last day
  • 34 downloads in the last week
  • 198 downloads in the last month