Skip to main content

Parallel pipelines for Python

Project description

Description

PaPy, which stands for parallel pipelines in Python, is a highly flexible framework that addresses the problem of creating scalable workflows to process or generate data. A workflow is created from Python functions(nodes) with well-defined call/return semantics, connected by pipes(edges) into a directed acyclic graph. Given the topology and input data, these functions are composed into nested higher-order maps, which are transparently and robustly evaluated in parallel on a single computer or remote hosts. The local and remote computational resources can be flexibly pooled and assigned to functional nodes. This allows to easily load-balance a pipeline and optimize the throughput. Data traverses the graph in batches of adjustable size: a trade-off between lazy-evaluation, parallelism and memory consuption. The simplicity and flexibility of distributed workflows using PaPy bridges the gap between desktop and grid.

Installation

The easiest way to get PaPy is if you have setuptools installed:

easy_install papy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

papy-1.0b1.tar.gz (1.8 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page