Skip to main content

Autosubmit: a versatile tool for managing Global Climate Coupled Models in Supercomputing Environments

Project description

Autosubmit is a tool to create, manage and monitor experiments by using
configured Computing Clusters, HPC's and Supercomputers remotely via ssh.


HOW TO DEPLOY/SETUP AUTOSUBMIT FRAMEWORK
========================================

- Autosubmit has been tested:

with the following Operating Systems:
* Linux Debian

on the following HPC's/Clusters:
* Ithaca (IC3 machine)
* MareNostrum (BSC machine)
* MareNostrum3 (BSC machine)
* HECToR (EPCC machine)
* Lindgren (PDC machine)
* C2A (ECMWF machine)
* ARCHER (EPCC machine)

- Pre-requisties: These packages (bash, python2, sqlite3, git-scm > 1.8.2, subversion) must be available at local
machine. These packages (argparse, dateutil, pyparsing, numpy, pydotplus, matplotlib) must be available for
python runtime. And the machine is also able to access HPC's/Clusters via password-less ssh.

- Install Autosubmit
> pip install autosubmit
or download, unpack and "python setup.py install"

- Create a repository for experiments: Say for example "/cfu/autosubmit" then
edit the repository path (LOCAL_ROOT_DIR) into autosubmit/config/dir_config.py

- Create a blank database: Say for example "autosubmit.db" at above created repository:
> cp autosubmit/database/data/autosubmit.sql /cfu/autosubmit/
> cd /cfu/autosubmit
> sqlite3 autosubmit.db
sqlite3>.read autosubmit.sql
> chmod 775 autosubmit.db
then edit the database file path and name (DB_DIR, DB_FILE, DB_NAME) into autosubmit/config/dir_config.py


HOW TO USE AUTOSUBMIT
=====================

To run AUTOSUBMiT experiments at CFU a production environment is set up at the local virtual machine "enterprise".

> cd bin

> python expid.py -h

> python expid.py --new --HPC ithaca --description "experiment is about..."

Say for example, "cxxx" is 4 character based expid generated by system automatically.
First character "c" represents the platform such as "i" for ithaca, "b" for bsc,
"h" for hector, "l" for lindgren, "e" for ecmwf and "m" for marenostrum3 etc. While rest
of three characters "xxx" are to represent unique alphanumeric identity for the experiment.

> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf

> vi /cfu/autosubmit/cxxx/conf/autosubmit_cxxx.conf

> python create_exp.py cxxx

> ssh enterprise

> cd bin

> nohup python autosubmit.py cxxx >& cxxx_01.log &

Cautions:
- Before launching autosubmit check the following stuff:
> ssh ithaca # say for example similarly check other HPC's where password-less ssh is feasible
- After launching autosubmit, one must be aware of login expeiry limit and policy (if applicable for any HPC)
and renew the login access accordingly (by using token/key etc) before expiry.

HOW TO MONITOR EXPERIMENT
=========================

> cd bin

> python monitor.py -h

> python monitor.py -e cxxx -j job_list -o pdf
or
> python monitor.py -e cxxx -j job_list -o png

Above generated plot with date & time stamp can be found at:

/cfu/autosubmit/cxxx/plot/cxxx_date_time.pdf
or
/cfu/autosubmit/cxxx/plot/cxxx_date_time.png


HOW TO RESTART EXPERIMENT
=========================

> cd bin

> python recovery.py -h

> python recovery.py -e cxxx -j job_list -g # getting/fetching completed files

> python recovery.py -e cxxx -j job_list -s # saving the pickle file

> nohup python autosubmit.py cxxx >& cxxx_02.log &


HOW TO RERUN/EXTEND EXPERIMENT
==============================

> ssh enterprise

> cd bin

> vi /cfu/autosubmit/cxxx/conf/expdef_cxxx.conf # modify RERUN, CHUNKLIST

> python create_exp.py cxxx

> nohup python autosubmit.py cxxx >& cxxx_03.log &

Monitor for RERUN
------------------

> python monitor.py -e cxxx -j rerun_job_list -o pdf

Recovery for RERUN
-------------------

> python recovery.py -e cxxx -j rerun_job_list -g

> python recovery.py -e cxxx -j rerun_job_list -s


Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autosubmit-3.0.0a24.tar.gz (50.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page