processor

A microframework to build source -> filter -> action workflows.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Badges

Simple rules

Python processor is a tool for creating chained pipelines for dataprocessing. It have very few key concepts:

Data object: Any python dict with two required fields: source and type.
Source: An iterable sequence of data objects or a function which returns data objects. See full list of sources in the docs.
Output: A function which accepts a data object as input and could output another. See full list of outputs in the docs. (or same) data object as result.
Predicate: Pipeline consists from sources outputs, but predicate decides which data object should be processed by which output.

Quick example

Here is example of pipeline which reads IMAP folder and sends all emails to Slack chat:

run_pipeline(
    sources.imap('imap.gmail.com'
                 'username',
                 'password'
                 'INBOX'),
    [prepare_email_for_slack, outputs.slack(SLACK_URL)])

Here you construct a pipeline, which uses sources.imap for reading imap folder “INBOX” of username@gmail.com. In more complex case outputs.fanout can be used for routing dataobjects to different processors and sources.mix can be used to merge items two or more sources into a one stream.

Functions prepare_email_to_slack and outputs.slack(SLACK_URL) are processors. First one is a simple function which accepts data object, returned by imap source and transforming it to the data object which could be used by slack.output. We need that because slack requires a different set of fields. Call to outputs.slack(SLACK_URL) returns a function which gets an object and send it to the specified Slack’s endpoint.

It is just example, for working snippets, continue reading this documention ;-)

Installation

Create a virtual environment with python3::

virtualenv --python=python3 env
source env/bin/activate

Install required version of hylang (this step is necessary because Hy syntax is not final yet and frequently changed by language maintainers)::

pip install -U 'git+git://github.com/hylang/hy.git@a3bd90390cb37b46ae33ce3a73ee84a0feacce7d#egg=hy'

If you are on OSX, then install lxml on OSX separately::

STATIC_DEPS=true pip install lxml

If you want to access IMAP over SSL on OSX, then you need to install openssl via homebrew, and then install pyopenssl like this::

brew install openssl
env LDFLAGS="-L$(brew --prefix openssl)/lib" \
    CFLAGS="-I$(brew --prefix openssl)/include" \
    pip install -U --force-reinstall pyopenssl

Then install the processor::

pip install processor

Usage

Now create an executable python script, where you’ll place your pipline’s configuration. For example, this simple code creates a process line which searches new results in Twitter and outputs them to console. Of cause, you can output them not only to console, but also post by email, to Slack chat or everywhere else if there is an output for it:

#!env/bin/python3
import os
from processor import run_pipeline, sources, outputs
from twiggy_goodies.setup import setup_logging


for_any_message = lambda msg: True

def prepare(tweet):
    return {'text': tweet['text'],
            'from': tweet['user']['screen_name']}

setup_logging('twitter.log')

run_pipeline(
    sources=[sources.twitter.search(
        'My Company',
        consumer_key='***', consumer_secret='***',
        access_token='***', access_secret='***',
        )],
    rules=[(for_any_message, [prepare, outputs.debug()])])

Running this code, will fetch new results for search by query My Company and output them on the screen. Of course, you could use any other output, supported by the processor. Browse online documentation to find out which sources and outputs are supported and for to configure them.

Ideas for Sources and Outputs

web-hook endpoint (in progress).
tail source which reads file and outputs lines appeared in a file between invocations or is able to emulate tail -f behaviour. Python module tailer could be used here.
grep output – a filter to grep some fields using patterns. With tail and grep you could build a pipeline which watch on a log and send errors by email or to the chat.
xmpp output.
irc output.
rss/atom feed reader.
weather source which tracks tomorrow’s weather forecast and outputs a message if it was changed significantly, for example from “sunny” to “rainy”.
github some integrations with github API?
jira or other task tracker of your choice?
suggest your ideas!

Documentation

https://python-processor.readthedocs.org/

Development

To run the all tests run:

tox

Authors

Alexander Artemenko - http://dev.svetlyak.ru

Changelog

0.10.0 (2016-01-04)

IMAP source was fixed to work with new IMAPClient’s API and support IMAPClient > 1.0.0.
Datastorage was fixed to get filename from PROCESSOR_DB environment variable in case if it was setup using os.environ['PROCESSOR_DB'] = 'some.db' after the imports.

0.9.0 (2015-12-06)

Code was fixed to work with HyLang from a3bd90390cb37b46ae33ce3a73ee84a0feacce7d commit. Please, use this pinned version of HyLang and subscribe on future release notes to know when this requirement will change.

0.8.0 (2015-11-16)

Code was fixed to work with latest Hy, from GitHub.
Added twitter.mentions source, to read stream of mentions from the Twitter.
Fixed a way how number of messages from IMAP folder is limited. Previously limit was applied even when we already know an ID of the last seen message, but now limit is ignored in this case and only applied when visiting the folder first time.

0.7.0 (2015-05-05)

New output – XMPP was added and now processor is able to notify Jabber users.

0.6.0 (2015-05-01)

The biggest change in this release is a new source – github.releases. It is able to read all new releases in given repository and send them into processing pipeline. This works as for public repositories, and for private too. Read the docs for futher details.

Other changes are:

Storage backend now saves JSON database nicely pretty printed for you could read and edit it in your favorite editor. This is Emacs, right?
Twitter.search source now saves state after the tweet was processed. This way processor shouldn’t loose tweets if there was exception somewhere in processing pipeline.
IMAP source was fixed and now is able to fetch emails from really big folders.

0.5.0 (2015-04-15)

Good news, everyone! New output was added - email. Now Processor is able to notify you via email about any event.

0.4.0 (2015-04-06)

Function run_pipline was simplified and now accepts only one source and one ouput. To implement more complex pipelines, use sources.mix and outputs.fanout helpers.

0.3.0 (2015-04-01)

Added a web.hook source.
Now source could be not only a iterable object, but any function which returns values.

0.2.1 (2015-03-30)

Fixed error in import-or-error macro, which prevented from using 3-party libraries.

0.2.0 (2015-03-30)

Most 3-party libraries are optional now. If you want to use some extension which requires external library, it will issue an error and call sys.exit(1) until you satisfy this requirement.

This should make life easier for thouse, who does not want to use rss output which requires feedgen which requires lxml which is hard to build because it is C extension.

0.1.0 (2015-03-18)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.10.0

Jan 4, 2016

0.9.0

Dec 6, 2015

0.8.0

Nov 16, 2015

0.7.0

May 5, 2015

0.6.0

May 1, 2015

0.5.0

Apr 15, 2015

0.4.0

Apr 6, 2015

0.3.0

Apr 1, 2015

0.2.1

Mar 30, 2015

0.2.0

Mar 30, 2015

0.1.0

Mar 18, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

processor-0.10.0.tar.gz (113.4 kB view hashes)

Uploaded Jan 4, 2016 Source

Hashes for processor-0.10.0.tar.gz

Hashes for processor-0.10.0.tar.gz
Algorithm	Hash digest
SHA256	`182a99a291560f472606730d404ead63688dd29ad41bd5ea525c7f58fab2ab1e`
MD5	`f325c4edcdae7f23388597e351fb274e`
BLAKE2b-256	`9d7076b434553b33eb91ac2f83cd6aa6286f3886342da2efea4bf0c47e62e841`

processor 0.10.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Badges

Simple rules

Quick example

Installation

Usage

Ideas for Sources and Outputs

Documentation

Development

Authors

Changelog

0.10.0 (2016-01-04)

0.9.0 (2015-12-06)

0.8.0 (2015-11-16)

0.7.0 (2015-05-05)

0.6.0 (2015-05-01)

0.5.0 (2015-04-15)

0.4.0 (2015-04-06)

0.3.0 (2015-04-01)

0.2.1 (2015-03-30)

0.2.0 (2015-03-30)

0.1.0 (2015-03-18)

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution