Skip to main content

A middleware to cache http response for Scrapy

Project description

PyPI Version Build Status

Overview

scrapy-httpcache is a scrapy middleware to save http cache in mongodb. Besides, scrapy-httpcache contains two extra storage plugin, including request_error_storage and banned_storage. request_error_storage can save Request which occur error. banned_storage can save Banned Request whose block_checker can be override.

Requirements

  • Python 3.3+

  • Works on Linux, Windows, Mac OSX, BSD

Install

The quick way:

pip install scrapy-httpcache

OR copy this middleware to your scrapy project.

Documentation

In settings.py, for example:

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE SETTINGS
# -----------------------------------------------------------------------------
DOWNLOADER_MIDDLEWARES.update({
    'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware': None,
    'scrapy_httpcache.downloadermiddlewares.httpcache.AsyncHttpCacheMiddleware': 900,
})

HTTPCACHE_ENABLED = True
HTTPCACHE_IGNORE_HTTP_CODES = [301, 302, 500, 503]
HTTPCACHE_STORAGE = 'scrapy_httpcache.extensions.httpcache_storage.MongoDBCacheStorage'
HTTPCACHE_MONGODB_STORAGE_URI = 'mongodb://127.0.0.1:27017'
HTTPCACHE_MONGODB_STORAGE_DB = MONGODB_DATABASE
HTTPCACHE_MONGODB_STORAGE_COLL = 'cache'

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE BANNED SETTINGS (optional)
# -----------------------------------------------------------------------------
BANNED_STORAGE = 'scrapy_httpcache.extensions.banned_storage.MongoBannedStorage'
BANNED_MONGODB_STORAGE_URI = 'mongodb://127.0.0.1:27017'
BANNED_MONGODB_STORAGE_DB = MONGODB_DATABASE

# -----------------------------------------------------------------------------
# SCRAPY HTTPCACHE REQUEST ERROR SETTINGS (optional)
# -----------------------------------------------------------------------------
REQUEST_ERROR_STORAGE = 'scrapy_httpcache.extensions.request_error_storage.MongoRequestErrorStorage'
REQUEST_ERROR_MONGODB_STORAGE_URI = 'mongodb://127.0.0.1:27017'
REQUEST_ERROR_MONGODB_STORAGE_DB = MONGODB_DATABASE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy-httpcache-0.0.1.tar.gz (12.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page