Jterator

A minimalistic pipeline engine

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Jterator
========

A minimalistic pipeline engine with motivation to be flexible, but uncomplicated enough, so that no GUI is required to work with it.

* use simple format like JSON for basic parametrization whenever possible
** don't be afraid to keep your input settings on the disk
* for performance heavy IO use HDF5
* used scientific data analysis.
* UNIX pipeline is a one great idea, but it is mostly restricted to text processing

Modules
-------

Think of your pipeline as a sequence of connected modules (a linked list). Each module is a program that get and reads JSON from on STDIN file descriptor.
Such JSON contains all the input settings required by the module.

Getting started
---------------

Each pipeline has the following layout on the disk:

* **handles** folder contains all the JSON handles files, the are passed as STDIN to *modules*.
* **modules** folder contains all the executable plus code for programs corresponding
* **logs** folder contains all the output from STDOU and STERR streams, obtain for each executable that has been executed.
* **data** folder contains all the heavy data output like HDF5, etc. These data are shared between modules.

Jterator allows only very simplistic type of workflow - *pipeline* (somewhat similar to a UNIX-world pipeline). Description of such workflow must be put sibling to the folder structure described about, i.e. inside the Jterator pipeline (project) folder. Recognizable file name must be one of **['JteratorPipe.json', 'jt.pipe']**. Description is a JSON format.

```json
{
"name": "Foo",
"version": "0.0.1",
"pipeline": [
{
"name": "Bar",
"module": "bar",
"handles": "handles/some_bar"
}, {
"name": "Baz",
"module": "baz",
"handles": "handles/some_baz"
}
],
"tests": [
{
"type": "hdf5_dependency",
"module_name": "baz",
"input": ["bar"],
"output": ["baz"]
}
]
}

```

To *run* your first pipeline, do:

```bash
cd /my/first/jterator/pipeline/folder && jt run
```

Developing new modules
======================

This is a small walk-through on how to develop a new module for *Jterator*. Each module as to follow a particular convention of processing input parameters. It can be written in virtually any programming language as long as such language can provide tools for working with *JSON* and *HDF5* data formats.

Developing Jterator
===================

Latest code is available at https://github.com/ewiger/Jterator

Nose tests
----------

We use nose framework to achieve code coverage with unit tests. In order to run tests, do

```bash
cd tests && nosetests
```

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.1

Aug 4, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Jterator-0.0.1.tar.gz (8.0 kB view hashes)

Uploaded Aug 4, 2014 Source

Hashes for Jterator-0.0.1.tar.gz

Hashes for Jterator-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`816c77399a822a13b1997fd7a6630ac84115fa35d0403b88bdb20c343434ede8`
MD5	`5dcf67a822cb623b4afef02778294a13`
BLAKE2b-256	`4fd88f2401e4eb490f157344d4b1c8a0e425ec362d472ff0d8a8534305cb51ff`