data-depgraph

Small dependency resolution library for scientific datasets

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

depgraph is a tiny Python library for expressing networks of dependencies required to construct datasets. Networks are declared in terms of the relationships between source and target datasets (network graph edges). depgraph can then report descendants and parents for any particular node and instruct builds in a manner similar to make. When a DependencyGraph object returns a dataset that must be built, it provides a reason, such as:

the dataset is missing
the dataset is out of date and required by another dataset
the dataset is a target dataset

depgraph is intended to be a component for assembling dataset build tools. Important considerations for such a build tool are that it must:

permit reproducible analysis
be documenting
perform fast rebuilds to enable experimentation

depgraph contains the following classes:

depgraph.DependencyGraph
depgraph.Dataset
depgraph.DatasetGroup
depgraph.Reason

Example

Declare a set of dependencies resembling the graph below:

R0      R1      R2      R3         [raw data]
  \     /       |       |
    DA0         DA1    /
        \      /  \   /
           DB0     DB1
            \     / |  \
             \   /  |   \
              DC0  DC1  DC2        [products]

from depgraph import Dataset, DependencyGraph

# Define Datasets
# use an optional keyword `tool` to provide a key instructing our build tool
# how to assemble this product
R0 = Dataset("data/raw0", tool="read_csv")
R1 = Dataset("data/raw1", tool="read_csv")
R2 = Dataset("data/raw2", tool="database_query")
R3 = Dataset("data/raw3", tool="read_hdf")

DA0 = Dataset("step1/da0", tool="merge_fish_counts")
DA1 = Dataset("step1/da1", tool="process_filter")

DB0 = Dataset("step2/db0", tool="join_counts")
DB1 = Dataset("step2/db1", tool="join_by_date")

DC0 = Dataset("results/dc0", tool="merge_model_obs")
DC1 = Dataset("results/dc1", tool="compute_uncertainty")
DC2 = Dataset("results/dc2", tool="make_plots")

graph = DependencyGraph()

# Declare relationships
graph.add_dataset(da0, (raw0, raw1))
graph.add_dataset(da1, (raw2,))
graph.add_dataset(db0, (da0, da1))
graph.add_dataset(db1, (da1, raw3))
graph.add_dataset(dc0, (db0, db1))
graph.add_dataset(dc1, (db1,))
graph.add_dataset(dc2, (db1,))

# Query buildsteps to build a product
while True:
    targets = graph.buildable(DC1)

    if len(targets) == 0:
        break

    for target, reason in targets:
        # Each target is a dataset with a 'name' attribute and whatever
        # additional keyword arguments where defined with it.
        # The 'reason' is a depgraph.Reason object that codifies why a
        # particular target is necessary (e.g. it's out of date, it's missing,
        # and required by a subsequent target, etc.)
        print("Building {0} with {1} because {2}".format(target.name,
                                                         target.tool,
                                                         reason))
        # Call a function or start a subprocess that will result in the
        # target being built and saved to a file
        my_build_func(target.tool, target.name)
        # [...]

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.4.4

Nov 9, 2017

0.4.3

Jul 21, 2017

0.4.2

Jun 29, 2017

0.4.1

Jun 22, 2017

0.4.0

Dec 14, 2016

0.3.5

Oct 26, 2016

0.3.4

Apr 10, 2016

0.3.3

Apr 8, 2016

0.3.2

Apr 7, 2016

0.3

Mar 29, 2016

0.3.dev0 pre-release

Mar 18, 2016

0.2

Mar 16, 2016

This version

0.1.dev1 pre-release

Mar 16, 2016

0.1.dev0 pre-release

Mar 16, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-depgraph-0.1.dev1.tar.gz (6.4 kB view hashes)

Uploaded Mar 16, 2016 Source

Hashes for data-depgraph-0.1.dev1.tar.gz

Hashes for data-depgraph-0.1.dev1.tar.gz
Algorithm	Hash digest
SHA256	`12b08dabe8ab90407bfb042c49d87d3cb62aa2da93c8a56c2197a381c713db86`
MD5	`5b9f8d740b79f312c42c4dc0c32eae45`
BLAKE2b-256	`451c89220bd861e956b8725774cfb844cf0bcd8164c385202018fd6c9ab98833`