Python package to extend Airflow functionality with CWL v1.0 support
Project description
- Travis CI
- CWL conformance tests
cwl-airflow
Python package to extend Apache-Airflow 1.9.0 functionality with CWL v1.0 support.
Try it out
- Install cwl-airflow
$ pip3 install cwl-airflow --user --find-links https://michael-kotliar.github.io/cwl-airflow-wheels/
- Init configuration
$ cwl-airflow init
- Run demo
$ cwl-airflow demo --auto
- When you see in the console output that Airflow Webserver is started, open the provided link
Installation requirements
- Ubuntu 16.04.4
- python 3.5.2
- pip3
wget https://bootstrap.pypa.io/get-pip.py python3 get-pip.py --user
- setuptools
pip3 install setuptools --user
- docker
Log out and log back in so that your group membership is re-evaluated.sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo groupadd docker sudo usermod -aG docker $USER
- python3-dev
sudo apt-get install python3-dev
Configuration
When running cwl-airflow init
the following parameters can be specified:
-l LIMIT
,--limit LIMIT
sets the number of processed jobs kept in history. Default 10 for each of the category: Running, Success, Failed-j JOBS
,--jobs JOBS
sets the path to the folder where all the new jobs will be added. Default~/airflow/jobs
-t DAG_TIMEOUT
,--timeout DAG_TIMEOUT
sets timeout (in seconds) for importing all the DAGs from the DAG folder. Default 30 seconds-r WEB_INTERVAL
,--refresh WEB_INTERVAL
sets the webserver workers refresh interval (in seconds). Default 30 seconds-w WEB_WORKERS
,--workers WEB_WORKERS
sets the number of webserver workers to be refreshed at the same time. Default 1-p THREADS
,--threads THREADS
sets the number of threads for Airflow Scheduler. Default 2
If core/dags_folder parameters from Airflow configuration file (default location ~/airflow/airflow.cfg)
has been updated manualy, make sure to rerun cwl-airflow init
Running
Batch mode
To automatically monitor and process all the job files present in a specific folder
-
Make sure your job files include the following mandatory fields:
uid
- unique ID, stringoutput_folder
- absolute path the the folder to save result, stringworkflow
- absolute path the the workflow to be run, string
Aditionally, job files may also include the
tmp_folder
parameter to point to the temporary folder absolute path. -
Put your JSON/YAML job files into the directory set as
jobs
incwl
section ofairflow.cfg
file (by default~/airflow/cwl/jobs
) -
Run Airflow scheduler:
$ airflow scheduler
Manual mode
To perform a single run of the specific CWL workflow and job files
cwl-airflow run WORKFLOW_FILE JOB_FILE
If uid
, output_folder
, workflow
and tmp_folder
fields are not present
in the job file, you may set the them with the following arguments:
-o, --outdir Output directory, default current directory
-t, --tmp Folder to store temporary data, default /tmp
-u, --uid Unique ID, default random uuid
Demo mode
- Get the list of the available demo workflows to run
$ cwl-airflow demo
- Run demo workflow from the list (if running on macOS, consider adding the directory where you
installed cwl-airflow package to the Docker / Preferences / File sharing options)
$ cwl-airflow demo super-enhancer.cwl
- Optionally, run
airflow webserver
to check workflow status (default webserver link)$ airflow webserver
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.