skip to navigation
skip to content

scrapy-pagestorage 0.2.1

Scrapy extension to store info in storage service

A scrapy extension to store requests and responses information in storage service.

Installation

You can install scrapy-pagestorage using pip:

pip install scrapy-pagestorage

You can then enable the middleware in your settings.py:

SPIDER_MIDDLEWARES = {
    ...
    'scrapy_pagestorage.PageStorageMiddleware': 900
}

How to use it

Enable extension through settings.py:

PAGE_STORAGE_ENABLED = True
PAGE_STORAGE_ON_ERROR_ENABLED = True

Configure the exension through settings.py:

PAGE_STORAGE_MODE = "VERSIONED_CACHE"
PAGE_STORAGE_LIMIT = 100
PAGE_STORAGE_ON_ERROR_LIMIT = 100
PAGE_STORAGE_TRIM_HTML = True

The extension is auto-enabled for Portia spiders (SHUB_SPIDER_TYPE=portia).

Settings

PAGE_STORAGE_MODE

Default: None

A string which specifies if the extension will store information using cache store or versioned cache store (set PAGE_STORAGE_MODE=”VERSIONED_CACHE” to use versioned one).

PAGE_STORAGE_LIMIT

An integer to set a limit of visited pages amount to store.

PAGE_STORAGE_ON_ERROR_LIMIT

An integer to set a limit for page errors amount to store.

PAGE_STORAGE_TRIM_HTML

Default: False

Remove whitespace from the start and end of the HTML to reduce file size.

 
File Type Py Version Uploaded on Size
scrapy-pagestorage-0.2.1.tar.gz (md5) Source 2017-08-16 3KB
scrapy_pagestorage-0.2.1-py2-none-any.whl (md5) Python Wheel py2 2017-08-16 5KB