Skip to main content

Hyperstream is a large-scale, flexible and robust software package for processing streaming data

Project description

![HyperStream logo](https://cdn.rawgit.com/IRC-SPHERE/HyperStream/dfbac332/hyperstream.svg)



# HyperStream #

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.242227.svg)](https://doi.org/10.5281/zenodo.242227)
[![Join the chat at https://gitter.im/IRC-SPHERE-HyperStream/Lobby](https://badges.gitter.im/IRC-SPHERE-HyperStream/Lobby.svg)](https://gitter.im/IRC-SPHERE-HyperStream/Lobby?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![Build Status](https://travis-ci.org/IRC-SPHERE/HyperStream.svg?branch=master)](https://travis-ci.org/IRC-SPHERE/HyperStream)
[![Dependency Status](https://www.versioneye.com/user/projects/58e423cb26a5bb005220301e/badge.svg?style=flat-square)](https://www.versioneye.com/user/projects/58e423cb26a5bb005220301e)
[![Test Coverage](https://codeclimate.com/github/IRC-SPHERE/HyperStream/badges/coverage.svg)](https://codeclimate.com/github/IRC-SPHERE/HyperStream/coverage)
[![Issue Count](https://codeclimate.com/github/IRC-SPHERE/HyperStream/badges/issue_count.svg)](https://codeclimate.com/github/IRC-SPHERE/HyperStream)
[![Documentation Status](https://readthedocs.org/projects/hyperstream/badge/?version=latest)](http://hyperstream.readthedocs.io/en/latest/?badge=latest)

Hyperstream is a large-scale, flexible and robust software package for processing streaming data.

Hyperstream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. Although developed specifically for SPHERE, Hyperstream is a general purpose tool that is well-suited for the design, development, and deployment of algorithms and predictive models in a wide space of sequential predictive problems.

This software has been designed from the outset to be domain-independent, in order to provide maximum value to the wider community. Key aspects of the software include the capability to create complex interlinked workflows, and a computational engine that is designed to be "compute-on-request", meaning that no unnecessary resources are used.

```diff
- NOTE: This software is in a stable but early beta, and hence the API may change significantly.
```

The system consists of the following 3 layers, from bottom up:

## Tools ##
Tools are the computation elements. They take in input data in a standard format (dict of list of
hyperstream.instance.Instance objects) and output data in a standard format (list of
hyperstream.instance.Instance objects). Tools are version controlled. Minor version numbers should be used for updates
that will not require recomputing streams, since the output should be identical (in expectation for stochastic
streams). The output should be identical (again, in expectation for stochastic streams) regardless of whether the tool is run twice on time-ranges t1..t2 and t2..t3 or just once on the time-range t1..t3. Major version number changes will cause the stream to be recomputed.

### Tool Versions ###
The tool versions form a major/minor/patch 3-tuple, e.g. v1.3.2. See https://pypi.python.org/pypi/semantic_version/ for details.
In our setting the major version number is treated as a binary flag: 0 for development, 1 for production. Minor version
numbers indicate changes that affect the output, or in the API. The patch number indicates changes that do not affect the
output or API in any way (e.g. speedups).

## Streams ##
Streams are objects that use a particular kernel for computation, with fixed parameters and filters defined that can
reduce the amount of data that needs to be read from the database. The stream is physically manifested in the database
(mongodb) for the time ranges that it has been computed on.

There are special data streams, for which a custom hyperstream.interface.Input or hyperstream.interface.Output objects
can be defined, in order to work with custom databases or file-based storage.

## Workflows ##
Workflows define a graph of streams. Usually, the first stream will be a special "raw" stream that pulls in data from a
custom data source. Workflows can have multiple time ranges, which will cause the streams to be computed on all of the
ranges given.

# Installation #
## Docker images ##
If you do not want to install all the packages separately you can use our Docker bundle available [here](https://github.com/IRC-SPHERE/Hyperstream-Dockerfiles).

## Local machine ##

``` Bash
git clone git@github.com:IRC-SPHERE/HyperStream.git
cd HyperStream
virtualenv venv
. venv/bin/activate
pip install -r requirements.txt
python -c 'from hyperstream import HyperStream'
```

Or install via pip

```
pip install -U git+git://github.com/IRC-SPHERE/HyperStream.git#egg=hyperstream
pip install -r requirements.txt
python -c 'from hyperstream import HyperStream'
```

Additionally, one of the requirements to run Hyperstream is a MongoDB server. By default, Hyperstream tries to connect to the port 27017 on the localhost.

To see the installation steps of MongoDB go to the [official documentation][1]. E.g. in a Debian OS it is possible to install with the following command

[1]: https://docs.mongodb.com/manual/installation/

``` Bash
sudo apt-get install mongodb
```

Once the MongoDB server is installed, it can be started with the following command

``` Bash
service mongod start
```

# Running tests #

Run the following command
```
nosetests
```

Note that for the MQTT logging test to succeed, you will need to have an MQTT broker running (e.g. Mosquitto). For example:

```
docker run -ti -p 1883:1883 -p 9001:9001 toke/mosquitto
```

or on OSX you will need pidof and mosquitto:

```
brew install pidof
brew install mosquitto
brew services start mosquitto
```

# Running the examples #

Super simple example:

```
from hyperstream import HyperStream, StreamId, TimeInterval
from hyperstream.utils import utcnow, UTC
from datetime import timedelta

hs = HyperStream(loglevel=20)
M = hs.channel_manager.memory
T = hs.channel_manager.tools
clock = StreamId(name="clock")
clock_tool = T[clock].window().last().value()
ticker = M.get_or_create_stream(stream_id=StreamId(name="ticker"))
now = utcnow()
before = (now - timedelta(seconds=30)).replace(tzinfo=UTC)
ti = TimeInterval(before, now)
clock_tool.execute(sources=[], sink=ticker, interval=ti, alignment_stream=None)
print(list(ticker.window().tail(5)))

[StreamInstance(timestamp=datetime.datetime(2017, 6, 30, 16, 23, 39, tzinfo=<UTC>), value=datetime.datetime(2017, 6, 30, 16, 23, 39, tzinfo=<UTC>)), StreamInstance(timestamp=datetime.datetime(2017, 6, 30, 16, 23, 40, tzinfo=<UTC>), value=datetime.datetime(2017, 6, 30, 16, 23, 40, tzinfo=<UTC>)), StreamInstance(timestamp=datetime.datetime(2017, 6, 30, 16, 23, 41, tzinfo=<UTC>), value=datetime.datetime(2017, 6, 30, 16, 23, 41, tzinfo=<UTC>)), StreamInstance(timestamp=datetime.datetime(2017, 6, 30, 16, 23, 42, tzinfo=<UTC>), value=datetime.datetime(2017, 6, 30, 16, 23, 42, tzinfo=<UTC>)), StreamInstance(timestamp=datetime.datetime(2017, 6, 30, 16, 23, 43, tzinfo=<UTC>), value=datetime.datetime(2017, 6, 30, 16, 23, 43, tzinfo=<UTC>))]
```

- more examples in the [tutorials branch](https://github.com/IRC-SPHERE/HyperStream/tree/tutorials/examples) (to be merged):

# HyperStream Viewer #
The [HyperStream Viewer](https://github.com/IRC-SPHERE/HyperStreamViewer) is a python/Flask web-app for interacting with HyperStream. In order to keep HyperStream to a minimum, this web-app is released as a separate repository that takes the core as a dependency.

# License #

This code is released under the [MIT license](https://github.com/IRC-SPHERE/HyperStream/blob/master/LICENSE).

# Acknowledgements #

This work has been funded by the UK Engineering and Physical Sciences Research Council (EPSRC) under Grant [EP/K031910/1](http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/K031910/1) - "SPHERE Interdisciplinary Research Collaboration".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperstream-0.2.6.tar.gz (78.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page