skip to navigation
skip to content

Not Logged In

scrapelib 0.7.3

a library for scraping things

Latest Version: 0.9.1


scrapelib is a library for making requests to websites, particularly those
that may be less-than-reliable.

scrapelib originated as part of the `Open States <`_
project to scrape the websites of all 50 state legislatures and as a result
was therefore designed with features desirable when dealing with sites that
have intermittent errors or require rate-limiting.

As of version 0.7 scrapelib has been retooled to take advantage of the superb
`requests <>`_ library.

Advantages of using scrapelib over alternatives like httplib2 simply using
requests as-is:

* All of the power of the suberb `requests <>`_ library.
* HTTP, HTTPS, and FTP requests via an identical API
* support for simple caching with pluggable cache backends
* request throtting
* configurable retries for non-permanent site failures
* optional robots.txt compliance

scrapelib is a project of Sunlight Labs (c) 2012.
All code is released under a BSD-style license, see LICENSE for details.

Written by James Turk <>

    * Michael Stephens - initial urllib2/httplib2 version
    * Joe Germuska - fix for IPython embedding
    * Alex Chiang - fix to test suite


* python 2.6, 2.7, or 3.2
* requests


scrapelib is available on PyPI and can be installed via ``pip install scrapelib``

PyPI package:



Example Usage


  import scrapelib
  s = scrapelib.Scraper(requests_per_minute=10, allow_cookies=True,

  # Grab Google front page

  # Will raise RobotExclusionError

  # Will be throttled to 10 HTTP requests per minute
  while True:
File Type Py Version Uploaded on Size
scrapelib-0.7.3.tar.gz (md5) Source 2012-06-21 12KB
  • Downloads (All Versions):
  • 37 downloads in the last day
  • 332 downloads in the last week
  • 2603 downloads in the last month