Skip to main content

A fast, append-only storage layer for numeric data that changes over time

Project description

## Gauged

A fast, append-only storage layer for gauges, counters, timers and other numeric data types that change over time.

Features:

- Cache-aware data structures and algorithms for speed and memory-efficiency.
- Efficient range queries and roll-ups of any size down to the configurable resolution of 1 second.
- Use either `MySQL`, `PostgreSQL` or `SQLite` as a backend.

## Installation

The library can be installed with `easy_install` or `pip`

```bash
$ pip install gauged
```

Python 2.7.x (CPython or PyPy) is required.

## Backends

You can use either `MySQL`, `PostgreSQL` or `SQLite` as a backend. If no URL is specified than a SQLite-based in-memory database will be used

```python
from gauged import Gauged

# any one of:
gauged = Gauged('mysql://root@localhost/gauged')
gauged = Gauged('postgresql://postgres@localhost/gauged')
gauged = Gauged('sqlite:////tmp/gauged.db')
gauged = Gauged()
```

On first run you'll need to create the schema

```python
gauged.sync()
```

## Writing data

The library has no concept of data types when writing. Rather, you store your counter, gauge and timer data in the same way, as a `(key, float_value)` pair, and then use the appropriate method when reading the data back

```python
with gauged.writer as writer:
writer.add({ 'requests': 1, 'response_time': 0.45, 'memory_usage': 145.6 }, timestamp=1389747759902)
writer.add({ 'requests': 1, 'response_time': 0.25, 'memory_usage': 148.3 }, timestamp=1389747760456)
```

The timestamp can be omitted (it defaults to now). Note that Gauged is append-only; data must be written in chronological order.

You can also write data to separate namespaces

```python
with gauged.writer as writer:
writer.add('requests', 1, namespace=1)
writer.add('requests', 1, namespace=2)
```

For more information, see the [technical overview][technical-overview].

## Reading data

##### gauged.aggregate(key, aggregate, start=None, end=None, namespace=None, percentile=None)

Fetch all values associated with the key during the specified date range (`[start, end)`), and then aggregate them using one of `Gauged.MIN`, `Gauged.MAX`, `Gauged.SUM`, `Gauged.COUNT`, `Gauged.MEAN`, `Gauged.MEDIAN`, `Gauged.STDDEV` or `Gauged.PERCENTILE`.

The `start` and `end` parameters can be either a timestamp in milliseconds, a `datetime` instance or a negative timestamp in milliseconds which is interpreted as relative to now. If omitted, both parameters default to the boundaries of all data that exists in the `namespace`. Note that the `Gauged.SECOND`, `Gauged.MINUTE`, `Gauged.HOUR`, `Gauged.DAY`, `GAUGED.WEEK` and `Gauged.NOW` constants can also be used.

Here's some examples

```python
# Count the total number of requests
requests = gauged.aggregate('requests', Gauged.SUM)

# Count the number of requests between 2014/01/01 and 2014/01/08
requests = gauged.aggregate('requests', Gauged.SUM, start=datetime(2014, 1, 1),
end=datetime(2014, 1, 8))

# Get the 95th percentile response time from the past week
response_time = gauged.aggregate('response_time', Gauged.PERCENTILE,
percentile=95, start=-Gauged.WEEK)
```

##### gauged.aggregate_series(key, aggregate, interval=Gauged.DAY, **kwargs)

The time series variant of `aggregate()`. This method takes the same kwargs as `aggregate()` and also accepts an `interval` in milliseconds.

This method returns a `TimeSeries` instance. See [gauged/results/time_series.py][time_series.py] for the result API.

The method is approximately equal to

```python
from gauged import TimeSeries
points = []
for timestamp in xrange(start, end, interval):
aggregate = gauged.aggregate(key, aggregate, start=timestamp, end=timestamp+interval)
points.append(( timestamp, aggregate ))
result = TimeSeries(points)
```

##### gauged.value(key, timestamp=None, namespace=None)

Read the value of a key at the specified time (defaults to now).

##### gauged.value_series(key, start=None, end=None, interval=Gauged.DAY, namespace=None)

The time series variant of `value()` which reads the value of a key at each `interval` steps in the range `[start, end)`.

##### gauged.keys(prefix=None, limit=None, offset=None, namespace=None)

Get a list of keys, optionally filtered by namespace or prefix.

##### gauged.namespaces()

Get a list of namespaces.

##### gauged.statistics(start=None, end=None, namespace=None)

Get write statistics for the specified namespace during the specified date range. The statistics include number of data points and the number of bytes they consume. See [gauged/results/statistics.py][statistics.py] for the result API.

## Plotting

The data can be plotted easily with [matplotlib][matplotlib]

```python
import pylab

series = gauged.aggregate_series('requests', gauged.SUM, interval=gauged.DAY,
start=-gauged.WEEK)
pylab.plot(series.dates, series.values, label='Requests per day for the past week')
pylab.show()
```

## Configuration

Configuration should be passed to the constructor

```python
gauged = Gauged('mysql://root@localhost/gauged', **config)
```

Configuration keys

- **key_whitelist** - a list of allowed keys. Default is `None`, i.e. allow all keys.
- **flush_seconds** - whether to periodically flush data when writing, e.g. `10` would cause a flush every 10 seconds. Default is `0` (don't flush).
- **namespace** - the default namespace to read and write to. Defaults to `0`.
- **key_overflow** - what to do when the key size is greater than the backend allows, either `Gauged.ERROR` (default) or `Gauged.IGNORE`.
- **gauge_nan** - what to do when attempting to write a `NaN` value, either `Gauged.ERROR` (default) or `Gauged.IGNORE`.
- **append_only_violation** - what to do when writes aren't done in chronological order, either `Gauged.ERROR` (default), `Gauged.IGNORE`, or `Gauged.REWRITE` which rewrites out-of-order timestamps in order to maintain the chronological constraint.
- **max_look_behind** - how far a `value(key, timestamp)` call will traverse when looking for the nearest measurement before `timestamp`. Default is `Gauged.WEEK`.
- **min_cache_interval** - time series calls with intervals smaller than this will not be cached. Default is `Gauged.HOUR`.
- **max_interval_steps** - throw an error if the number of interval steps is greater than this. Default is `31 * 24`.
- **block_size** - see the [technical overview][technical-overview]. Defaults to `Gauged.DAY`.
- **resolution** - see the [technical overview][technical-overview]. Defaults to `Gauged.SECOND`.

## Tests

You can run a subset of the test suite using only an in-memory driver with `make check-quick`.

To run the full suite, first edit the configuration in `test_drivers.cfg` so that PostgreSQL and Mysql both point to existing (and empty) databases, then run

```bash
$ make check
```

You can run coverage analysis with `make coverage` and run a lint tool `make lint`.

## Benchmarks

Use `python benchmark [OPTIONS]`

```bash
$ python benchmark.py --number 1000000 --days 365
Writing to sqlite:// (block_size=86400000, resolution=1000)
Spreading 1M measurements to key "foobar" over 365 days
Wrote 1M measurements in 4.912 seconds (203.6K/s) (rss: 12.4MB)
Gauge data uses 7.6MB (8B per measurement)
min() in 0.022s (read 45.2M measurements/s) (rss: 12.5MB)
max() in 0.022s (read 44.8M measurements/s) (rss: 12.6MB)
sum() in 0.023s (read 43.1M measurements/s) (rss: 12.6MB)
count() in 0.023s (read 43.2M measurements/s) (rss: 12.6MB)
mean() in 0.029s (read 35M measurements/s) (rss: 12.6MB)
stddev() in 0.044s (read 22.8M measurements/s) (rss: 23.6MB)
median() in 0.069s (read 14.6M measurements/s) (rss: 31.9MB)
```

You can also run `make cbenchmark` to run C benchmarks.

## License

GPLv3


[technical-overview]: https://github.com/chriso/gauged/blob/master/docs/technical-overview.md
[time_series.py]: https://github.com/chriso/gauged/blob/master/gauged/results/time_series.py
[statistics.py]: https://github.com/chriso/gauged/blob/master/gauged/results/statistics.py
[matplotlib]: http://matplotlib.org/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gauged-0.1.0.tar.gz (48.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page