babbage

A light-weight analytical engine for OLAP processing

These details have been verified by PyPI

Maintainers

akariv brew okfn pudo vitorbaptista

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 2.7
- Python :: 3.3

Project description

# Babbage Analytical Engine

[![Gitter](https://img.shields.io/gitter/room/openspending/chat.svg)](https://gitter.im/openspending/chat)
[![Build Status](https://travis-ci.org/openspending/babbage.svg?branch=master)](https://travis-ci.org/spendb/babbage)
[![Coverage Status](https://coveralls.io/repos/openspending/babbage/badge.svg?branch=master&service=github)](https://coveralls.io/github/openspending/babbage?branch=master)

``babbage`` is a lightweight implementation of an OLAP-style database
query tool for PostgreSQL. Given a database schema and a logical model
of the data, it can be used to perform analytical queries against that
data - programmatically or via a web API.

It is heavily inspired by [Cubes](http://cubes.databrewery.org/) but
has less ambitious goals, i.e. no pre-computation of aggregates, or
multiple storage backends.

``babbage`` is not specific to government finances, and could easily be used e.g. for ReGENESIS, a project that makes German national statistics available via an API. The API functions by interpreting modelling metadata generated by the user (measures and dimensions).

## Installation and test

``babbage`` will normally included as a PyPI dependency, or installed via
``pip``:

```bash
$ pip install babbage
```

People interested in contributing to the package should instead check out the
source repository and then use the provided ``Makefile`` to install the
library (this requires ``virtualenv`` to be installed):

```bash
$ git clone https://github.com/spendb/babbage.git
$ cd babbage
$ make install
$ make test
```

## Usage

``babbage`` is used to query a set of existing database tables, using an
abstract, logical model to query them. A sample of a logical model can be
found in ``tests/fixtures/models/cra.json``, and a JSON schema specifying
the model is available in ``babbage/schema/model.json``.

The central unit of ``babbage`` is a ``Cube``, i.e. a [OLAP cube](https://en.wikipedia.org/wiki/OLAP_cube) that uses the provided model metadata to construct queries
against a database table. Additionally, the application supports managing
multiple cubes at the same time via a ``CubeManager``, which can be
subclassed to enable application-specific ways of defining cubes and where
their metadata is stored.

Futher, ``babbage`` includes a Flask Blueprint that can be used to expose
a standard API via HTTP. This API is consumed by the JavaScript ``babbage.ui``
package and it is very closely modelled on the Cubes and OpenSpending HTTP
APIs.

### Programmatic usage

Let's assume you have an existing database table of procurement data and
want to query it using ``babbage`` in a Python shell. A session might look
like this:

```python
import json
from sqlalchemy import create_engine
from babbage.cube import Cube
from babbage.model import Measure

engine = create_engine('postgresql://localhost/procurement')
model = json.load(open('procurement_model.json', 'r'))

cube = Cube(engine, 'procurement', model)
facts = cube.facts(page_size=5)

# There are 17201 rows in the table:
assert facts['total_fact_count'] == 17201

# There's a field called 'total_value':
assert 'total_value' in facts['fields']

# We can get metadata about it:
concept = cube.model['total_value']
assert isinstance(concept, Measure)
assert concept.label == 'Total Value'

# And there's some actual data:
assert len(facts['data']) == 5
fact_0 = facts['data'][0]
assert 'total_value' in fact_0

# For dimensions, we can get all the distinct values:
members = cube.members('supplier', cut='year:2015', page_size=500)
assert len(members['data']) <= 500
assert members['total_member_count']

# And, finally, we can aggregate by specific dimensions:
aggregate = cube.aggregate(aggregates='total_value.sum',
drilldowns='supplier|authority'
cut='year:2015|authority.country:GB',
page_size=500)
# This translates to:
# Aggregate the procurement data by summing up the 'total_value'
# for each unique pair of values in the 'supplier' and 'authority'
# dimensions, and filter for only those entries where the 'year'
# dimensions key attribute is '2015' and the 'authority' dimensions
# 'country' attribute is 'GB'. Return the first 500 results.
assert aggregate['total_cell_count']
assert len(aggregate['cells']) <= 500
aggregate_0 = aggregate['cells'][0]
assert 'total_value.sum' in aggregate_0

# Note that these attribute names are made up for this example, they
# should be reflected from the model:
assert 'supplier.code' in aggregate_0
assert 'supplier.label' in aggregate_0
assert 'authority.code' in aggregate_0
assert 'authority.label' in aggregate_0
```

### Using the HTTP API

The HTTP API for ``babbage`` is a simple Flask [Blueprint](http://flask.pocoo.org/docs/latest/blueprints/) used to expose a small set of calls that correspond to
the cube functions listed above. To include it into an existing Flask
application, you would need to create a ``CubeManager`` and then
configure the API like this:

```python
from flask import Flask
from sqlalchemy import create_engine
from babbage.manager import JSONCubeManager
from babbage.api import configure_api

app = Flask('demo')
engine =
models_directory = 'models/'
manager = JSONCubeManager(engine, models_directory)
blueprint = configure_api(app, manager)
app.register_blueprint(blueprint, url_prefix='/api/babbage')

app.run()
```

Of course, you can define your own ``CubeManager``, for example if
you wish to retrieve model metadata from a database.

When enabled, the API will expose a number of JSON(P) endpoints
relative to the given ``url_prefix``:

* ``/``, returns the system status and version.
* ``/cubes``, returns a list of the available cubes (name only).
* ``/cubes/<name>/model``, returns full metadata for a given
cube (i.e. measures, dimensions, aggregates etc.)
* ``/cubes/<name>/facts`` is used to return individual entries from
the cube in a non-aggregated form. Supports filters (``cut``), a
set of ``fields`` to return and a ``sort`` (``field_name:direction``),
as well as ``page`` and ``page_size``.
* ``/cubes/<name>/members`` is used to return the distinct set of
values for a given dimension, e.g. all the suppliers mentioned in
a procurement dataset. Supports filters (``cut``), a and a ``sort``
(``field_name:direction``), as well as ``page`` and ``page_size``.
* ``/cubes/<name>/aggregate`` is the main endpoint for generating
aggregate views of the data. Supports specifying the ``aggregates``
to include, the ``drilldowns`` to aggregate by, a set of filters
(``cut``), a and a ``sort`` (``field_name:direction``), as well
as ``page`` and ``page_size``.

Project details

These details have been verified by PyPI

Maintainers

akariv brew okfn pudo vitorbaptista

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 2.7
- Python :: 3.3

Release history Release notifications | RSS feed

0.4.0

Jul 25, 2020

0.3.6

Apr 30, 2018

0.3.5

Mar 12, 2018

0.3.4

Mar 11, 2018

0.3.3

Sep 27, 2017

0.3.2

Sep 27, 2017

0.3.1

Aug 19, 2017

0.3.0

Aug 10, 2017

0.2.3

Aug 8, 2017

0.2.2

Aug 8, 2017

0.2.1

Apr 21, 2017

This version

0.2.0

Jul 5, 2016

0.1.1

Aug 21, 2015

0.1.0

Aug 21, 2015

0.0.1

Jul 10, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

babbage-0.2.0.tar.gz (16.5 kB view hashes)

Uploaded Jul 5, 2016 Source

Hashes for babbage-0.2.0.tar.gz

Hashes for babbage-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4ce49a417dce3c629c64870c048831b856ac8319586ff8ed053dbae6c302b651`
MD5	`8d9115b2fb3381920a9ac4b08a691b98`
BLAKE2b-256	`8676498510c34a3323bd0625e58481241bacc9bc4ec7d0d9309327e64afd0457`