Skip to main content

Distribution Support for Scrapy & Gerapy using RabbitMQ

Project description

Gerapy RabbitMQ

This is a package for supporting distribution in Scrapy using RabbitMQ, also this package is a module in Gerapy.

Installation

You can install with this command:

pip3 install gerapy-rabbitmq

Usage

Required configuration:

# Use RabbitMQ for queue
SCHEDULER = "gerapy_rabbitmq.scheduler.Scheduler"
SCHEDULER_QUEUE_KEY = '%(spider)s_requests'

# RabbitMQ Connection Parameters, see https://pika.readthedocs.io/en/stable/modules/parameters.html
RABBITMQ_CONNECTION_PARAMETERS = {
    'host': 'localhost'
}

# Use Redis for dupefilter
DUPEFILTER_CLASS = "gerapy_redis.dupefilter.RFPDupeFilter"
SCHEDULER_DUPEFILTER_KEY = '%(spider)s:dupefilter'

Optional configuration:

# RabbitMQ Queue Configuration
SCHEDULER_QUEUE_DURABLE = True
SCHEDULER_QUEUE_MAX_PRIORITY = 100
SCHEDULER_QUEUE_PRIORITY_OFFSET = 30
SCHEDULER_QUEUE_FORCE_FLUSH = True
SCHEDULER_PERSIST = False
SCHEDULER_IDLE_BEFORE_CLOSE = 0
SCHEDULER_FLUSH_ON_START = False
SCHEDULER_PRE_ENQUEUE_ALL_START_REQUESTS = True

More

For more detail, you can refer to example.

RabbitMQ Preview

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gerapy-rabbitmq-0.1.1.tar.gz (6.3 kB view hashes)

Uploaded Source

Built Distribution

gerapy_rabbitmq-0.1.1-py2.py3-none-any.whl (6.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page