Skip to main content

A light-weight analytical engine for OLAP processing

Project description

# Babbage Analytical Engine

[![Gitter](https://img.shields.io/gitter/room/openspending/chat.svg)](https://gitter.im/openspending/chat)
[![Build Status](https://travis-ci.org/openspending/babbage.svg?branch=master)](https://travis-ci.org/spendb/babbage)
[![Coverage Status](https://coveralls.io/repos/openspending/babbage/badge.svg?branch=master&service=github)](https://coveralls.io/github/openspending/babbage?branch=master)

``babbage`` is a lightweight implementation of an OLAP-style database
query tool for PostgreSQL. Given a database schema and a logical model
of the data, it can be used to perform analytical queries against that
data - programmatically or via a web API.

It is heavily inspired by [Cubes](http://cubes.databrewery.org/) but
has less ambitious goals, i.e. no pre-computation of aggregates, or
multiple storage backends.

``babbage`` is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).

## Installation and test

``babbage`` will normally included as a PyPI dependency, or installed via
``pip``:

```bash
$ pip install babbage
```

People interested in contributing to the package should instead check out the
source repository and then use the provided ``Makefile`` to install the
library (this requires ``virtualenv`` to be installed):

```bash
$ git clone https://github.com/spendb/babbage.git
$ cd babbage
$ make install
$ make test
```

## Usage

``babbage`` is used to query a set of existing database tables, using an
abstract, logical model to query them. A sample of a logical model can be
found in ``tests/fixtures/models/cra.json``, and a JSON schema specifying
the model is available in ``babbage/schema/model.json``.

The central unit of ``babbage`` is a ``Cube``, i.e. a [OLAP cube](https://en.wikipedia.org/wiki/OLAP_cube) that uses the provided model metadata to construct queries
against a database table. Additionally, the application supports managing
multiple cubes at the same time via a ``CubeManager``, which can be
subclassed to enable application-specific ways of defining cubes and where
their metadata is stored.

Futher, ``babbage`` includes a Flask Blueprint that can be used to expose
a standard API via HTTP. This API is consumed by the JavaScript ``babbage.ui``
package and it is very closely modelled on the Cubes and OpenSpending HTTP
APIs.

### Programmatic usage

Let's assume you have an existing database table of procurement data and
want to query it using ``babbage`` in a Python shell. A session might look
like this:

```python
import json
from sqlalchemy import create_engine
from babbage.cube import Cube
from babbage.model import Measure

engine = create_engine('postgresql://localhost/procurement')
model = json.load(open('procurement_model.json', 'r'))

cube = Cube(engine, 'procurement', model)
facts = cube.facts(page_size=5)

# There are 17201 rows in the table:
assert facts['total_fact_count'] == 17201

# There's a field called 'total_value':
assert 'total_value' in facts['fields']

# We can get metadata about it:
concept = cube.model['total_value']
assert isinstance(concept, Measure)
assert concept.label == 'Total Value'

# And there's some actual data:
assert len(facts['data']) == 5
fact_0 = facts['data'][0]
assert 'total_value' in fact_0

# For dimensions, we can get all the distinct values:
members = cube.members('supplier', cut='year:2015', page_size=500)
assert len(members['data']) <= 500
assert members['total_member_count']

# And, finally, we can aggregate by specific dimensions:
aggregate = cube.aggregate(aggregates='total_value.sum',
drilldowns='supplier|authority'
cut='year:2015|authority.country:GB',
page_size=500)
# This translates to:
# Aggregate the procurement data by summing up the 'total_value'
# for each unique pair of values in the 'supplier' and 'authority'
# dimensions, and filter for only those entries where the 'year'
# dimensions key attribute is '2015' and the 'authority' dimensions
# 'country' attribute is 'GB'. Return the first 500 results.
assert aggregate['total_cell_count']
assert len(aggregate['cells']) <= 500
aggregate_0 = aggregate['cells'][0]
assert 'total_value.sum' in aggregate_0

# Note that these attribute names are made up for this example, they
# should be reflected from the model:
assert 'supplier.code' in aggregate_0
assert 'supplier.label' in aggregate_0
assert 'authority.code' in aggregate_0
assert 'authority.label' in aggregate_0
```

### Using the HTTP API

The HTTP API for ``babbage`` is a simple Flask [Blueprint](http://flask.pocoo.org/docs/latest/blueprints/) used to expose a small set of calls that correspond to
the cube functions listed above. To include it into an existing Flask
application, you would need to create a ``CubeManager`` and then
configure the API like this:

```python
from flask import Flask
from sqlalchemy import create_engine
from babbage.manager import JSONCubeManager
from babbage.api import configure_api

app = Flask('demo')
engine =
models_directory = 'models/'
manager = JSONCubeManager(engine, models_directory)
blueprint = configure_api(app, manager)
app.register_blueprint(blueprint, url_prefix='/api/babbage')

app.run()
```

Of course, you can define your own ``CubeManager``, for example if
you wish to retrieve model metadata from a database.

When enabled, the API will expose a number of JSON(P) endpoints
relative to the given ``url_prefix``:

* ``/``, returns the system status and version.
* ``/cubes``, returns a list of the available cubes (name only).
* ``/cubes/<name>/model``, returns full metadata for a given
cube (i.e. measures, dimensions, aggregates etc.)
* ``/cubes/<name>/facts`` is used to return individual entries from
the cube in a non-aggregated form. Supports filters (``cut``), a
set of ``fields`` to return and a ``sort`` (``field_name:direction``),
as well as ``page`` and ``page_size``.
* ``/cubes/<name>/members`` is used to return the distinct set of
values for a given dimension, e.g. all the suppliers mentioned in
a procurement dataset. Supports filters (``cut``), a and a ``sort``
(``field_name:direction``), as well as ``page`` and ``page_size``.
* ``/cubes/<name>/aggregate`` is the main endpoint for generating
aggregate views of the data. Supports specifying the ``aggregates``
to include, the ``drilldowns`` to aggregate by, a set of filters
(``cut``), a and a ``sort`` (``field_name:direction``), as well
as ``page`` and ``page_size``.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babbage-0.2.0.tar.gz (16.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page