skip to navigation
skip to content

cache_requests 4.0.0

Simple. Powerful. Persistent LRU caching for the requests library.

Package Documentation

cache_requests


Simple. Powerful. Persistent LRU caching for the requests library.

Features

  • Drop in decorator for the requests library.
  • Automatic timer based expiration on stored items (optional).
  • Backed by yahoo’s powerful redislite.
  • Scalable with redis. Optionally accepts a redis connection.
  • Exposes the powerful underlying Memoize decorator to decorate any function.
  • Tested with high coverage.
  • Lightweight. Simple logic.
  • Lightning fast.
  • Jump start your development cycle.
  • Collect and reuse entire response objects.

Installation

At the command line either via easy_install or pip

$ pip install cache_requests
$ easy_install cache_requests

Or, if you have virtualenvwrapper installed

$ mkvirtualenv cache_requests
$ pip install cache_requests

Uninstall

$ pip uninstall cache_requests

Usage

To use cache_requests in a project

import cache_requests

Quick Start

To use cache_requests in a project

>>> from cache_requests import Session()

requests = Session()

# from python-requests.org
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}

Config Options

Decorated Methods

method.ex
sets the default expiration (seconds) for new cache entries.
method.redis
creates the connection to the redis or redislite database. By default this is a redislite connection. However, a redis connection can be dropped in for easy scalability.

cache_requests.Session

  • ex is shared between request methods. They can be accessed by Session.cache.ex or Session.get.ex, where get is the requests.get method
  • By default requests that return and error will not be cached. This can be overridden by overriding the Session.cache.set_cache_cb to return False. The callback takes the response object as an argument
from cache_requests import Session

requests = Session()

requests.cache.set_cache_db = lambda _:False
  • By default only autonomous methods are cached (get, head, options). Each method can be setup to be cached using the Session.cache config option.

These methods are accessed through the Session objects Session.cache.[method name]. They can be overridden with the Session.cache.all setting.

For example

from cache_requests import Session

requests = Session()

requests.cache.delete = True

# cached, only called once.
requests.delete('http://google.com')
requests.delete('http://google.com')

requests.cache.delete = True

# not cached, called twice.
requests.delete('http://google.com')
requests.delete('http://google.com')

# cache ALL methods
requests.cache.all = True

# don't cache any methods
requests.cache.all = False

# Use individual method cache options.
requests.cache.all = None
Default settings
Method Cached
get True
head True
options True
post False
put False
patch False
delete False
all None

Function Level Config

Cache Busting
Use keyword bust_cache=True in a memoized function to force reevaluation.
Conditionally Set Cache
Use keyword set_cache to provide a callback. The callback takes the results of function as an argument and must return a bool. Alternatively, True and False can be used.

Use Case Scenarios

Development: 3rd Party APIs

Scenario:
Working on a project that uses a 3rd party API or service.
Things you want:
  • A cache that persists between sessions and is lightning fast.
  • Ability to rapidly explore the API and it’s parameters.
  • Ability to inspect and debug response content.
  • Ability to focus on progress.
  • Perfect transition to a production environment.
Things you don’t want:
  • Dependency on network and server stability for development.
  • Spamming the API. Especially APIs with limits.
  • Responses that change in non-meaningful ways.
  • Burning energy with copypasta or fake data to run piece of your program.
  • Slow. Responses.

Make a request one time. Cache the results for the rest of your work session.

import os

if os.environ.get('ENV') == 'DEVELOP':
    from cache_requests import Session

    request = Session(ex=60 * 60 )  # Set expiration, 60 min
else:
    import requests

# strange, complicated request you might make
headers = {"accept-encoding": "gzip, deflate, sdch", "accept-language": "en-US,en;q=0.8"}
payload = dict(sourceid="chrome-instant", ion="1", espv="2", ie="UTF-8", client="ubuntu",
               q="hash%20a%20dictionary%20python")
response = requests.get('http://google.com/search', headers=headers, params=payload)

# spam to prove a point
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)

# tweak your query, we're exploring here
payload = dict(sourceid="chrome-instant", ion="1", espv="2", ie="UTF-8", client="ubuntu",
               q="hash%20a%20dictionary%20python2")
# do you see what changed? the caching tool did.
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)
response = requests.get('http://google.com/search', headers=headers, params=payload)

Production: Web Scraping

Automatically expire old content.

  • How often? After a day? A week? A Month? etc. 100% of this logic is built in with the Session.cache.ex setting.
  • Effectively it can manage all of the time-based rotation.
  • Perfect if you theres more data then what your API caps allow.

One line of code to use a redis full database.

  • Try redislite; it can handle quite a bit. The redislite api used by this module is 1:1 with the redis package. Just replace the connection parameter/config value.
  • redis is a drop in:
    connection  = redis.StrictRedis(host='localhost', port=6379, db=0)
    requests = Session(connection=connection)

* Everything else just works.  There's no magic required.
from cache_requests import Session

connection  = redis.StrictRedis(host='localhost', port=6379, db=0)
ex = 7 * 24 * 60 * 60 # 1 week

requests = Session(ex=ex, connection=connection)

for i in range(1000)
    payload = dict(q=i)
    response = requests.get('http://google.com/search', params=payload)
    print(response.text)

Usage: memoize

from cache_requests import Memoize

@Memoize(ex=15 * 60)  # 15 min, default, 60 min
def amazing_but_expensive_function(*args, **kwargs)
    print("You're going to like this")

Credits

Tools used in rendering this package:

History

Next Release

  • Stay tuned.

4.0.0 (2015-12-25)

  • Fix: Use MD5 for hash to avoid PYTHONHASHSEED issue.
  • Fix: Give default dbfilename a more unique name, based on caller.
  • BREAKING:Move Session.ex and Session.connection to Session.cache config object.
  • Updated examples. New example demonstrates Memoize decorator.
  • Updated requirements.

3.0.0 (2015-12-22)

  • Feature: Cache busting! Use keyword argument bust_cache=True to force reevaluation.
  • Feature: Session automatically skips caching error responses.
  • Feature: Callback argument to decide if results should be cached.
  • Feature: Decorated Session methods share a centralized configuration per session.
  • BREAKING: Remove global config, in favor component level config. Reasoning: Global config adds way too much complexity and adds too little value. (Everything needs to lazy load the config at the last moment)
  • Fix: Unique cache per function in shared db.
  • Fix: Tweaks to keep the classes sub classable.
  • Fix: Cleaned up tests.
  • Updated requirements.

2.0.0 (2015-12-12)

  • API completely rewritten
  • New API extends requests internals as opposed to monkeypatching.
  • Entire package is redesigned to be more maintainable, more modular, and more usable.
  • Dependencies are pinned.
  • Tests are expanded.
  • PY26 and PY32 support is dropped, because of dependency constraints.
  • PY35 support is added.
  • Docs are rewritten.
  • Move towards idiomatic code.
  • 2.0.6 Fix broken coverage, broken rst render.

1.0.0 (2015-04-23)

  • First real release.

  • Feature/ Unit test suite, very high coverage.

  • Feature/ redislite integration.

  • Feature/ Documentation. https://cache-requests.readthedocs.org.

  • Feature/ Exposed the beefed up Memoize decorator.

  • Feature/ Upgraded compatibility to:
    • PY26
    • PY27
    • PY33
    • PY34
    • PYPY
  • Added examples and case studies.

0.1.0 (2015-04-19)

  • First release on PyPI.
 
File Type Py Version Uploaded on Size
cache_requests-4.0.0-py2.py3-none-any.whl (md5) Python Wheel py2.py3 2015-12-26 16KB
cache_requests-4.0.0.tar.gz (md5) Source 2015-12-26 29KB