Skip to main content

Simple HTTP cache for WSGI apps with fine-grained invalidation

Project description

Simple HTTP cache for Python/WSGI applications with fine-grained invalidation.

Enables caching of HTTP resources in your application with smart control of invalidation through tags. If you know what invalidates your cache entries and have a relatively low turn over (e.g your reads outnumbers your writes) you can benefit greatly from wsgi-accelerator. See Use-case section for a more detailed description.

Caching at the HTTP level is really powerful since it allows you to cache responses that already are rendered/serialized and compressed, ready to be sent without additional computation for each request.

Features

  • Cache read-heavy URLs

  • ETag support

  • Fine-grained, tag-based cache invalidation

  • Time-based cache invalidation (ala Cache-Control)

  • Pluggable cache stores - In-memory and Redis store included

  • Super simple and hackable - less than 200 LOC excluding tests

  • MIT licensed

Getting started

Right now wsgi-accelerator is not available through PyPI as it is still receiving tweaks. But after that it will be the normal pip install scenario.

Once you installed it you can enable it with the default in-memory store like this:

from accelerator import WSGIAccelerator

def app(environ, start_response):
    response_body = 'Hello World'
    status = '200 OK'

    response_headers = [
        ('Content-Type', 'text/plain'),
        ('Content-Length', str(len(response_body)))
    ]

    environ['accelerator.cache_for'] = 5 # Cache for five seconds
    environ['accelerator.tags'] = ['foo', 'bar']
    start_response(status, response_headers)
    return [response_body]

app = WSGIAccelerator(app)

# To invalidate the response generated by the WSGI app above, just call:
app.invalidate_tag(['foo'])

# This will trigger invalidation of that URL path + query since it was
# tagged when the response was generated.

Please note that the in-memory store is not intended for production use. It is merely for testing and to serve as a correct reference implementation.

Maybe it goes without saying but all responses cached through wsgi-accelerator must NOT contain user-specific data at all.

Use-case

Imagine you have a popular site with lots of users and social feature. Having a popular site also means you serve quite a lot of traffic, meaning large costs in terms of hardware and bandwidth. One of the reasons for this is that you usually need to serve fresh content every HTTP from your servers since that could have been changed.

Other sites that also are high-traffic but more static in nature, such as Wikipedia or newspaper, usually reach for an HTTP cache like Varnish to help cope with this problem. This works incredibly well and can lessen CPU load quite significantly. It is also worth noting that caching only realizes it’s full potential when reads outnumbers expensive writes.

But for a dynamic site with more user-generated content, caching is usually a lot trickier if not impossible at an HTTP level. So most caching end up taking place at the “business/domain logic” level inside the application. Such caching really do help a lot for saving resources but still requires each HTTP request to trigger code paths deep in your application resulting burned CPU cycles.

A good solution to this problem would be to enable HTTP-level caching of volatile, user-generated content. And that’s where wsgi-accelerator comes in :)

An example

On this imaginary site, we have user profile page similar to this:

http://cgbystrom.com/static/img/profile_page.png

Where each yellow bubble represents data sources making up the final page.

If we were to cache this page through traditional means, it would go stale as soon as the user did an action that changed any of the above. Clearly less than optimal. But the good thing is that we know exactly when those actions can and will take place. By using this information we could make sure to invalidate the cache entry for that page as soon as the user trigger any of those events. So the next time that page is loaded it would hit the application as normal, getting rendered and cached with the new changed data. This concept of having fine-grained ways of triggering invalidation of content residing in the cache enables you to cache entire HTTP responses without the risk of serving stale data.

And with this being a normal WSGI middleware, no extra servers/proxies are needed except for a possible backing cache store.

License

Open source licensed under the MIT license (see LICENSE file for details).

If you find this idea and library useful, please let me know. Would love to hear your story and use-case.

Carl Byström (@cgbystrom)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wsgi-accelerator-0.1.0.tar.gz (8.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page