Skip to main content

Tools for batching jobs and dealing with file paths

Project description

Overview

This tool is intended to automate generation of scripts to run analysis on data sets. To use it, you will need a data set that has been created (or annotated) with dtool.

Installation

To install the jobarchitect package.

$ cd jobarchitect
$ python setup.py install

Use

To generate bash scripts for data analysis, first create a job template, e.g.:

$ echo "echo {input_file} > {output_file}" > job.tmpl

Then an example dataset:

$ datatool new dataset
project_name [project_name]:
dataset_name [dataset_name]: example_dataset
...

$ echo "My example data" > example_dataset/data/my_file.txt
$ datatool manifest update example_dataset/

Create an output directory:

$ mkdir output

Then you can generate analysis run scripts with:

$ sketchjob job.tmpl example_dataset/ output/
#!/bin/bash
_analyse_by_ids   --program_template="echo {input_file} > {output_file}"   --input_dataset_path=example_dataset/   --output_root=output/   e4c73fa7c34b76499ac13fc5c335fa007e9c3e8f

Try the script with:

$ sketchjob job.tmpl example_dataset/ output/ > run.sh
$ bash run.sh
$ cat output/my_file.txt
/Users/hartleym/scratch/example_dataset/data/my_file.txt

Working with Docker

Building a Docker image

For the tests to pass, you will need to build an example Docker image, which you do with the provided script:

$ bash build_docker_image.sh

Running code with the Docker backend

By inspecting the script and associcated Docker file, you can get an idea of how to build Docker images that can be used with the jobarchitect Docker backend, e.g:

$ sketchjob job.tmpl example_dataset/ output/ --backend=docker --image-name=jicscicomp/jobarchitect

#!/bin/bash
IMAGE_NAME=jicscicomp/jobarchitect
docker run    --rm    -v example_dataset/:/input_dataset:ro    -v output/:/output    $IMAGE_NAME    _analyse_by_ids      --program_template "echo {input_file} > {output_file}"      --input_dataset_path=/input_dataset      --output_root=/output      e4c73fa7c34b76499ac13fc5c335fa007e9c3e8f

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jobarchitect-0.1.0.tar.gz (4.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page