Skip to main content

Caching Workflow Engine

Project description

CacheFlow is a caching workflow engine, capable of executing dataflows while reusing previous results where appropriate, for efficiency. It is very extensible and can be used in many projects.

Goals

  • ☑ Python 3 workflow system

  • ☑ Executes dataflows from JSON files

  • ☐ Can also load from SQL database

  • ☐ Parallel execution

  • ☐ Streaming

  • ☑ Extensible: can add new modules, new storage formats, new caching mechanism, new executors

  • ☐ Pluggable: extensions can be installed from PyPI without forking

  • ☑ Re-usable: can execute workflows by itself, but can also be embedded into applications. Some I plan on developing myself:

    • Literate programming app: snippets or modules embedded into a markdown file, which are executed on render (similar to Rmarkdown). Results would be cached, making later rendering fast

    • Integrate in some of my NYU research projects (VisTrails Vizier, D3M)

Other ideas:

  • ☐ Use Jupyter kernels as backends to execute code (giving me quick access to all the languages they support)

  • ☐ Isolate script execution (to run untrusted Python/… code, for example with Docker)

Non-goals

  • Make a super-scalable and fast workflow execution engine: I’d rather make executors based on Spark, Dask, Ray than re-implement those

Status

Basic structures are here, extracted from D3M. Execution works.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cacheflow-0.1.tar.gz (8.3 kB view hashes)

Uploaded Source

Built Distribution

cacheflow-0.1-py3-none-any.whl (11.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page