skip to navigation
skip to content

Index of Packages Matching 'scrapy'

Package Weight* Description
python-scrapyd-api 2.0.1 9 A Python wrapper for working with the Scrapyd API
s01.scrapy 0.16.2 9 Package for buildout based scrapy spider development
Scrapy 1.5.0 9 A high-level Web Crawling and Web Scraping framework
scrapy-beautifulsoup 0.0.2 9 Simple Scrapy middleware to process non-well-formed HTML with BeautifulSoup
scrapy-heroku 0.7.1 9 Utilities for running scrapy on heroku
scrapy-itemagic 0.2.4 9 Scrapy item parsing tools.
scrapy-nimbus 0.5.3 9 scrapy extend
scrapy-prometheus 0.4.4 9 Exporting scrapy stats as prometheus metrics through pushgateway service
scrapy-qiniu 0.1.2 9 Scrapy pipeline extension for
scrapy-random-ua 0.3 9 Scrapy Middleware to set a random User-Agent for every Request.
scrapy-random-useragent 0.2 9 Scrapy Middleware to set a random User-Agent for every Request.
scrapy-redirect 0.1.0 9 Restrict authorized Scrapy redirections to the website start_urls
scrapy-redis 0.6.8 9 Redis-based components for Scrapy.
scrapy-sqs-exporter 1.0.4 9 Scrapy extension for outputting scraped items to an Amazon SQS instance
scrapy-statsd 1.0.0a1 9 Publish Scrapy stats to statsd
scrapy-tdd 0.1.3 9 Helpers and examples to build Scrapy Crawlers in a test driven way.
scrapy-venom 0.1 9 Generic classes to deal with data scraping using Scrapy
scrapy-wayback-machine 1.0.0 9 A Scrapy middleware for scraping Wayback Machine snapshots from
scrapy_model 0.1.5 9 Scrapy helper to create scrapers from models
scrapybox 0.1 9 A Scrapy GUI
ScrapyElasticSearch 0.9.1 9 Scrapy pipeline which allow you to store multiple scrapy items in Elastic Search.
scrapyscript 1.0.0 9 Run a Scrapy spider programmatically from a script or a Celery task - no project required.
scrapy-dblite 0.2.7 8 Simple library for storing Scrapy Items in sqlite database
scrapy-djangoitem 1.1.1 8 Scrapy extension to write scraped items using Django models
scrapy-dotpersistence 0.3.0 8 Scrapy extension to sync `.scrapy` folder to an S3 bucket
scrapy-elasticsearch-bulk-item-exporter 0.2 8 An extension of Scrapys JsonLinesItemExporter that exports to elasticsearch bulk format.
scrapy-feedexporter-sftp 1.0.4 8 Scrapy extension Feed Exporter Storage Backend to export items to an SFTP server
scrapy-fieldstats 0.1.4 8 A Scrapy extension to generate a summary of fields coverage from your scraped data.
scrapy-jsonrpc 0.3.0 8 Scrapy extenstion to control spiders using JSON-RPC
scrapy-lambda 0.1.1 8 Scrapy pipeline which invokes a lambda with the scraped item
scrapy-mongodb 0.12.0 8 Pipeline to MongoDB for Scrapy. Supports MongoDB replica sets
scrapy-pagestorage 0.2.1 8 Scrapy extension to store info in storage service
scrapy-redis-bloomfilter 0.7.0 8 Scrapy Redis BloomFilter
scrapy-rethinkdb 0.0.4 8 Scrapy pipeline for rethinkdb.
scrapy-s3pipeline 0.2.0 8 Scrapy pipeline to store chunked items into AWS S3 bucket
scrapy-ssdb 0.0.1 8 scrapy and ssdb
scrapy-vampire 0.1.0 8 utils for scrapy
scrapyapperyio 0.1.3 8 Scrapy pipeline which allows you to store scrapy items in database.
ScrapyCouchDB 0.2 8 Scrapy pipeline which allow you to store scrapy items in CouchDB database.
scrapyd 1.2.0 8 A service for running Scrapy spiders, with an HTTP API
scrapyd-client 1.2.0a1 8 A client for scrapyd
scrapyd-mongodb 0.1.1 8 Scrapyd Queue Management with MongoDB
ScrapyGraphite 0.2 8 Output scrapy statistics to carbon/graphite.
ScrapyMongoDB 0.4.3 8 Scrapy pipeline which allow you to store scrapy items in MongoDB database.
scrapysolr 0.2.0 8 Scrapy pipeline which allows you to store scrapy items in a Solr server.
django-scrapy 0.1a1 7 apis for you can use scrapy in django
nimbus-scrapy 0.8.0 7 nimbus_scrapy
nimbus-scrapyd-api 0.1.3 7 nimbus_scrapyd_api
pyscrapy 0.3.0 7 Useful pyscrapy.
scrapinghub-entrypoint-scrapy 0.11.1 7 Scrapy entrypoint for Scrapinghub job runner
scrapy-algolia-exporter 0.0.2 7 Scrapy item exporter for the Algolia API
scrapy-boilerplate 0.2.1 7 Small set of utilities to simplify writing Scrapy spiders.
scrapy-botproxy 1.0.1 7 BotProxy (IP Rotating HTTP proxy) downloader middlewarefor Scrapy
scrapy-cloudflare-middleware 0.0.1 7 A Scrapy Middleware to bypass the CloudFlare's anti-bot protection
scrapy-crawl-once 0.1.1 7 Scrapy middleware which allows to crawl only new content
scrapy-deltafetch 1.2.1 7 Scrapy middleware to ignore previously crawled pages
scrapy-do 0.2.0 7 A Spider Runner for Scrapy
scrapy-eagle 0.0.37 7 Run Scrapy Distributed
scrapy-fake-useragent 1.1.0 7 Use a random User-Agent provided by fake-useragent for every request
scrapy-feedexporter-azure-blob 0.2 7 Scrapy extension Feed Exporter Storage Backend to export items to a Azure blob container
scrapy-hcf 1.0.0 7 Scrapy spider middleware to use Scrapinghub's Hub Crawl Frontier as a backend for URLs
scrapy-html-storage 0.3.0 7 Scrapy downloader middleware that stores response HTML files to disk.
scrapy-httpcache 0.0.4 7 A middleware to cache http response for Scrapy
scrapy-inline-requests 0.3.1 7 A decorator for writing coroutine-like spider callbacks.
scrapy-jsonschema 0.1.0 7 Scrapy schema validation pipeline and Item builder using JSON Schema
scrapy-kafka 0.1.1 7 Kafka-based components for Scrapy
scrapy-magicfields 1.1.0 7 Scrapy middleware to add extra "magic" fields to items
scrapy-mailgun 0.3.0 7 scrapy-mailgun: Hook emails with scrapy.
scrapy-mongodb-queue 0.1.0 7 MongoDB-based components for Scrapy
scrapy-mosquitera 0.1.1 7 Restrict crawl and scraping scope using matchers.
scrapy-multifeedexporter 0.1.1 7 Export scraped items of different types to multiple feeds.
scrapy-mysql-pipeline 2017.10.10 7 Asynchronous mysql Scrapy item pipeline
scrapy-proxy-rotator 0.1.0 7 Scrapy downloader middleware that rotates proxies.
scrapy-querycleaner 1.0.0 7 Scrapy spider middleware to clean up query parameters in request URLs
scrapy-rabbitmq 0.1.2 7 RabbitMQ Plug-in for Scrapy
scrapy-rotated-proxy 0.1.0 7 A middleware to change proxy rotated for Scrapy
scrapy-rotating-proxies 0.5 7 Rotating proxies for Scrapy
scrapy-rss 0.1.5 7 RSS Tools for Scrapy Framework
scrapy-rss-exporter 0.1 7 An RSS Exporter for Scrapy
scrapy-s3-cache 0.0.1 7 Use S3 as a cache backend in Scrapy projects.
scrapy-save-statistics 0.2 7 Scrapy Save Statistics: Save statistics extension for Scrapy
scrapy-selenium 0.0.3 7 Scrapy with selenium
scrapy-sentry 0.8.0 7 Sentry component for Scrapy
scrapy-spiderdocs 0.1.2 7 Generate spiders md documentation based on spider docstrings.
scrapy-splash 0.7.2 7 JavaScript support for Scrapy using Splash
scrapy-splitvariants 1.1.0 7 Scrapy spider middleware to split an item into multiple items on a multi-valued key
scrapy-sqlitem 0.1.2 7 Scrapy extension to save items to a sql database
scrapy-statsd-middleware 0.0.8 7 Statsd integration middleware for scrapy
scrapy-status-mailer 0.3 7 Scrapy Status Mailer: Status mailer extension for Scrapy
scrapy-twostage 0.0.4 7 Two stage Scrapy spider: download and extract
Scrapy-UserAgents 0.0.1 7 A middleware to change user-agent in request for Scrapy
Scrapy_mingle 0.1.30 7 Some useful tools for Scrapy
scrapyd-heroku 0.1.0 7 A wrapper for running Scrapyd in Heroku or in console as normal Scrapyd service
scrapyd-subspider 0.0.2 7 scrapyd-subspider
scrapyd_kit 0.1.6 7 A kit for extending Scrapyd
scrapydo 0.2.2 7 Crochet-based blocking API for Scrapy.
scrapyjs 0.2 7 JavaScript support for Scrapy using Splash
scrapylib 1.7.1 7 Scrapy helper functions and processors
scrapymon 0.1.0 7 Simple management UI for scrapyd
scrapyrt 0.10 7 Put Scrapy spiders behind an HTTP API
scrapyrwiki 0.2 7 A collection of helpers for running Scrapy in ScraperWiki
scrapyz 0.3.3 7 Scrape Easy
avc-scrapy-helper 0.0.3 6 scrapy helper for scrapy.
elasticstats-scrapy 0.1.5 6 A scrapy extension to send crawl stats to elasticsearch index.
leon-scrapy-proxies 0.5.15 6 Scrapy Proxies: random proxy middleware for Scrapy
scrapy-amazon-robot-middleware-jondot 0.2.5 6 Scrapy middleware module which uses image parsing to submit a captcha response to amazon.
scrapy-amazon-robot-middleware3 0.3.3 6 Scrapy middleware module which uses image parsing to submit a captcha response to amazon.
scrapy-broadsoftxchange 1.17 6 Download documents and published software from Broadsoft Xchange
scrapy-corenlp 0.2.0 6 Scrapy spider middleware :: Stanford CoreNLP Named Entity Recognition
scrapy-crawlera 1.3.0 6 Crawlera middleware for Scrapy
scrapy-dynamodb 0.2 6 AWS DynamoDB pipeline for Scrapy
scrapy-elves 0.1.0 6 utils for parse html
scrapy-grpc 1.0 6 Scrapy extension to control spiders using gRPC
scrapy-kafka-export 0.1.1 6 Export Scrapy items to Kafka
scrapy-prometheus-exporter 1.0.2 6 Scrapy extension to export stats to Prometheus
scrapy-proxies 0.4 6 Scrapy Proxies: random proxy middleware for Scrapy
scrapy-proxymesh 0.0.3 6 Proxymesh downloader middleware for Scrapy
scrapy-rabbitmq-link 0.3.0 6 RabbitMQ plug-in for Scrapy
scrapy-streamitem 0.1.0 6 Scrapy support for working with streamcorpus Stream Items
scrapy-tools 0.0.5 6 tools for scrapy_tools, consist of middlewares
scrapyc 0.0.4 6 Simple client to scrapyd. Done right.
ScrapyDot 0.1 6 Export a graph of link between crawled items in dot file format.
scrapymongocache 0.1.0 6 A base pipeline and a decorator to allow you to cache item fields in MongoDB collections.
xonsh-scrapy-tabcomplete 0.3 6 scrapy tabcomplete support for the Xonsh shell
cyberplant-Scrapy 1.2.0.dev2 5 A high-level Web Crawling and Web Scraping framework
frontoxy 1.0.3 5 Distributed URLs frontier for Scrapy with RabbitMQ
noscrapy 0.0.2 5 Python port attempt of web-scraper-chrome-extension.
s01.core 0.5.0 5 Scrapy worker core packages
s01.demo 0.16.2 5 Buildout based scrapy spider demo package for s01.worker
s01.worker 0.5.0 5 Scrapy worker based on buildout with JSON-RPC 2.0 API
scrapy-block-inspector 0.0.2 5
scrapy-notifications 0.1.0 5 Send HTTP notifications on spider actions
scrapy-pipeline-mongodb 0.0.7 5
scrapy-proxy-validation 0.0.4 5
scutils 1.3.0.dev4 5 Utilities for Scrapy Cluster
stickymeta 0.0.5 5 Handy tools to maintain persistent meta values between requests in Scrapy spiders
custom-manager 0.91 4 A scrapy proxies handling
django-scrapy-douban 1.0 4 A simple Django app to collect and display movie's short comments from
metascrapy 0.4 4 Scrapes meta data from a link with python
scrapy-cdr 0.6.0 4
scrapy-rediscluster 1.0.2 4
scrapy-sqs-pipeline 0.1.1 4 Write scraped items to Amazon SQS.
scrapy-ui 0.1 4 Placeholder package
scrapydd 0.5.0 4
web-walker 3.1.5 4 your can crawl web pages with litte settings. based on scrapy.
arachnado 0.2 3 Scrapy-based Web Crawler with an UI
Arachne 0.5.0 3 API for Scrapy spiders
Arachne-Strahi 0.5.0 3 API for Scrapy spiders, adjusted for Novelship
ArachneStrahi 0.5.0 3 API for Scrapy spiders, adjusted for Novelship
autologin-middleware 0.1.6 3 A Scrapy middleware to use with autologin
django-dynamic-scraper 0.13.1 3 Creating Scrapy scrapers via the Django admin interface
django-scraoy 0.1a1 3 apis for you can use scrapy in django
FishFishJump 0.2.2 3 Simple solution for search engines in the python
habra-favorites 1.4.0 3 Sort your favorites posts from
hepcrawl 9.0.13 3 Scrapy project for feeds into INSPIRE-HEP (
hero-crawl 0.1.4 3 Helpers for Scrapy and Flask on Heroku
inspire-crawler 1.1.5 3 Crawler integration with INSPIRE-HEP.
pa11ycrawler 1.6.2 3 A Scrapy spider for a11y auditing Open edX installations.
pgpipeline 0.4.0 3 Pgpipeline: An automatic postgres item pipeline for Scrapy
proxy-middleware 0.2.0 3 Scrapy http proxy middleware that gets proxy parameters from settings
pyspider 0.3.9 3 A Powerful Spider System in Python
retr 1.4 3 Parallelised proxy-switching engine for scraping. Based on requests or aiohttp (beta). Includes scrapy middleware.
s01.client 0.5.0 3 JSON-RPC 2.0 s01.worker client
sitesearcher 0.1a2 3 A command line tool that creates fulltext search indexes of your favourite websites on your machine, and allows you to search them locally
weblocust 1.0.3 3 A more Powerful Spider System in Python based on pyspider
abu-quant 0.0.1 2 股票量化
abupy 0.4.0 2 阿布量化系统
abuQuant 0.0.1 2 股票量化
aduana 0.2.1 2 Bindings for Aduana library
afaq-dl 1.0 2 Download the online book An Anarchist FAQ
crappyspider 0.3 2 Test your site.
crawl-frontier 0.2.0.post0.dev29 2 A flexible frontier for web crawlers
digs 0.1.7 2 Making easier the text crawling tasks over websites with depth levels.
distributed-frontera 0.2.0 2 [deprecated] Distributed version of Frontera, flexible frontier for web crawlers
fara_principals 0.0.7 2 A web scraper designed to collect Foreign Principal information from
frontera 0.7.1 2 A scalable frontier for web crawlers
fxportia 0.0.1 2 Convert portia spider definitions to python scrapy spiders
geocrawl 0.2.2 2 A library to stream geocaching related entities from the official website
greplink 0.0.1 2 web scraper based on asyncio in pure Python3
ilmwetter 0.2 2 Scrapy spider for the weather in Ilmenau
incapsula-cracker 0.1.3 2 A way to bypass incapsula robot checks when using requests or scrapy.
page_clustering 0.0.1 2 Online k-means clustering of web pages
page_finder 0.1.9 2
parsel 1.4.0 2 Parsel is a library to extract data from HTML and XML using XPath and CSS selectors
portia2code 0.0.16 2 Convert portia spider definitions to python scrapy spiders
py3spider 1.0.7 2 仿Scrapy实现,基于py3.4+的多线程异步网络爬虫,实例请访问
pycameo 0.1.2a1 2 cameo scrapy project
queuelib 1.4.2 2 Collection of persistent (disk-based) queues
recur7down 2 recursive web scraper code for work related project
scrapely 0.13.4 2 A pure-python HTML screen-scraping library
scrapoxy 1.11 2 Use Scrapoxy with Scrapy
scrapple 0.3.0 2 A framework for creating web content extractors
scrongo 0.0.0 2 Non-blocking ItemExporter for MongoDB
slybot 0.13.1 2 Slybot crawler
structure-spider 1.2.1 2 mutil requests to combine a structure item.
take 0.2.0 2 A DSL for extracting data from a web page.
wayback-machine-scraper 1.0.7 2 A command-line utility for scraping Wayback Machine snapshots from
ze 0.0.17.dev1 2 Scaper to lager portal of news in Brazil.
ze-the-scraper 0.0.17.dev1 2 Scaper to lager portal of news in Brazil.
ant_nest 0.30.0 1 A simple and clear Web Crawler framework build on python3.6+ with async
aranha 0.1.1 1 simple gevent based web spider and tools
assignment 0.3 1 calculation of minimum price of given book
autologin 0.1.4 1 A utility for finding login links, forms and autologging into websites with a set of valid credentials.
autopager 0.2 1 Detect and classify pagination links on web pages
autoresponse 0.3.1 1 UNKNOWN
cerial 0.0.1 1 Python3 serializer with memoryview support
checklists 0.4.12 1 A reusable Django model for managing checklists of birds.
checklists_scrapers 0.2.3 1 Web scrapers for downloading checklists of birds from onlinedatabases such as eBird.
chinaAQI 0.1.2 1 the data scrapied from
crawlmi 0.2.2 1 Highly optimized web scraping framework.
creepy 0.1.6 1 Dead simple web crawler for Python
Crwy 1.0.7 1 A Simple Web Crawling and Web Scraping framework
cssselect 1.0.3 1 cssselect parses CSS3 Selectors and translates them to XPath 1.0
datalad 0.9.1 1 data distribution geared toward scientific datasets
Django-Youtuber 0.1.1 1 Django app for fetching videos from channels or playlists.
DocDownloader 1.3.2 1 Downloads Documentation from ReadTheDocs in multiple formats
docxpy 0.8.5 1 A pure python-based utility to extract text, hyperlinks and imagesfrom docx files.
extruct 0.4.0 1 Extract embedded metadata from HTML markup
freebora 0.1.0 1
frontera-seedloader-mongodb 0.0.6 1
hsdata 0.2.16 1 用数据玩炉石!快速收集和分析炉石传说的卡牌及卡组数据。
hubstorage 0.23.6 1 Client interface for Scrapinghub HubStorage
icrawler 0.4.9 1 A mini framework of image crawlers
imagebot 1.2.1 1 A web bot to crawl websites and scrape images.
loginform 1.2.0 1 Fill HTML login forms automatically
main 0.1 1 calculation of minimum price of given book
MaybeDont 0.1.1 1 A component that tried to avoid downloading duplicate content
mdr 0.0.1 1 python library to detect and extract listing data from HTML page
minreq 0.1.0 1 Check required data in a request.
mp3_zing_downloader 1.3.3 1 Downloader to download music from
music_scraper 1.1.0 1 Gets Songs from the web and allows users to download the same
parsel-cli 0.2.0 1 Parsel Command Line Interface
persist-queue 0.3.5 1 A thread-safe disk based persistent queue in Python.
photopipe 0.1.0b4 1 PhotoPipe is a pipeline for automated reduction, photometry and astrometry of imaging data from RATIR and RIMAS.
pomp 0.2.1 1 Screen scraping and web crawling framework
pqueue 0.1.7 1 A single process, persistent multi-producer, multi-consumer queue.
ProxyYourSpider 1.0.2 1 Proxy your spider and crawl the galaxy.
PyNSXv 0.4.1 1 PyNSXv is a higher level python based library and CLI tool to control NSX for vSphere
pypackt 1.0.1 1 Tool to claim your daily free eBooks at with ease.
PyPyDispatcher 2.1.2 1 Multi-producer-multi-consumer signal dispatching mechanism
pyscrap 0.0.9 1 micro framework for web scraping
pystock-crawler 0.8.2 1 Crawl and parse stock historical data
rbco.recipe.pyeclipse 0.0.5 1 Creates a Pydev project for Eclipse.
requestium 0.1.9 1 Adds a Selenium webdriver and parsel's parser to a request's Session object, and makes switching between them seamless. Handles cookie, proxy and header transfer.
rio-client 0.3.0 1 Client for Rio.
scrapekit 0.2.1 1 Light-weight tools for web scraping
shub-image 0.2.5 1 Scrapinghub release tool
social 0.1.dev3 1 Social is a python package for social data analysis and exploitation
soft404 0.2.1 1 A classifier for detecting soft 404 pages
spidy-web-crawler 1.6.5 1 Spidy is the simple, easy to use command line web crawler.
splash 3.2 1 A javascript rendered with a HTTP API
splinter_model 0.1.6 1 Splinter helper to create scrapers from models
stereo 0.1.2 1 Generate documents from CSV file in batch
structominer 0.2.0 1 Data scraping for a more civilized age
w3lib 1.19.0 1 Library of web-related functions
wallapopy 1.0.3 1 A Python client for Wallapop

*: occurrence of search term weighted by field (name, summary, keywords, description, author, maintainer)