covalent-slurm-plugin

Covalent Slurm Plugin

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Covalent Slurm Plugin

Covalent is a Pythonic workflow tool used to execute tasks on advanced computing hardware. This executor plugin interfaces Covalent with HPC systems managed by Slurm. For workflows to be deployable, users must have SSH access to the Slurm login node, writable storage space on the remote filesystem, and permissions to submit jobs to Slurm.

Installation

To use this plugin with Covalent, simply install it using pip:

pip install covalent-slurm-plugin

On the remote system, the Python version in the environment you plan to use must match that used when dispatching the calculations. Additionally, the remote system's Python environment must have the base covalent package installed (e.g. pip install covalent).

Usage

The following shows an example of a Covalent configuration that is modified to support Slurm:

[executors.slurm]
username = "user"
address = "login.cluster.org"
ssh_key_file = "/home/user/.ssh/id_rsa"
remote_workdir = "/scratch/user"
cache_dir = "/tmp/covalent"

[executors.slurm.options]
nodes = 1
ntasks = 4
cpus-per-task = 8
constraint = "gpu"
gpus = 4
qos = "regular"

[executors.slurm.srun_options]
cpu_bind = "cores"
gpus = 4
gpu-bind = "single:1"

The first stanza describes default connection parameters for a user who can connect to the Slurm login node using, for example:

ssh -i /home/user/.ssh/id_rsa user@login.cluster.org

The second and third stanzas describe default parameters for #SBATCH directives and default parameters passed directly to srun, respectively.

This example generates a script containing the following preamble:

   #!/bin/bash
   #SBATCH --nodes=1
   #SBATCH --ntasks=4
   #SBATCH --cpus-per-task=8
   #SBATCH --constraint=gpu
   #SBATCH --gpus=4
   #SBATCH --qos=regular

and subsequent workflow submission with:

   srun --cpu_bind=cores --gpus=4 --gpu-bind=single:1

To use the configuration settings, an electron’s executor must be specified with a string argument, in this case:

   import covalent as ct

   @ct.electron(executor="slurm")
   def my_task(x, y):
       return x + y

Alternatively, passing a SlurmExecutor instance enables custom behavior scoped to specific tasks. Here, the executor's prerun_commands and postrun_commands parameters can be used to list shell commands to be executed before and after submitting the workflow. These may include any additional srun commands apart from workflow submission. Commands can also be nested inside the submission call to srun by using the srun_append parameter.

More complex jobs can be crafted by using these optional parameters. For example, the instance below runs a job that accesses CPU and GPU resources on a single node, while profiling GPU usage via nsys and issuing complementary commands that pause/resume the central hardware counter.

   executor = ct.executor.SlurmExecutor(
       remote_workdir="/scratch/user/experiment1",
       options={
           "qos": "regular",
           "time": "01:30:00",
           "nodes": 1,
           "constraint": "gpu",
       },
       prerun_commands=[
           "module load package/1.2.3",
           "srun --ntasks-per-node 1 dcgmi profile --pause"
       ],
       srun_options={
           "n": 4,
           "c": 8,
           "cpu-bind": "cores",
           "G": 4,
           "gpu-bind": "single:1"
       },
       srun_append="nsys profile --stats=true -t cuda --gpu-metrics-device=all",
       postrun_commands=[
           "srun --ntasks-per-node 1 dcgmi profile --resume",
       ]
   )

   @ct.electron(executor=executor)
   def my_custom_task(x, y):
       return x + y

Here the corresponding submit script contains the following commands:

   module load package/1.2.3
   srun --ntasks-per-node 1 dcgmi profile --pause

   srun -n 4 -c 8 --cpu-bind=cores -G 4 --gpu-bind=single:1 \
   nsys profile --stats=true -t cuda --gpu-metrics-device=all \
   python /scratch/user/experiment1/workflow_script.py

   srun --ntasks-per-node 1 dcgmi profile --resume

Release Notes

Release notes are available in the Changelog.

Citation

Please use the following citation in any publications:

W. J. Cunningham, S. K. Radha, F. Hasan, J. Kanem, S. W. Neagle, and S. Sanand. Covalent. Zenodo, 2022. https://doi.org/10.5281/zenodo.5903364

License

Covalent is licensed under the Apache License 2.0. See the LICENSE file or contact the support team for more details.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.18.0

Jan 26, 2024

0.18.0rc0 pre-release

Jan 26, 2024

0.16.0rc0 pre-release

May 12, 2023

0.15.0rc0 pre-release

May 12, 2023

0.14.0rc0 pre-release

May 12, 2023

0.13.0rc0 pre-release

May 12, 2023

0.12.1.post1

Sep 21, 2023

0.12.1

May 5, 2023

0.12.1rc0 pre-release

May 5, 2023

0.12.0rc0 pre-release

May 5, 2023

0.8.0

Nov 19, 2022

0.8.0rc0 pre-release

Nov 19, 2022

0.7.0

Aug 23, 2022

0.7.0rc0 pre-release

Aug 23, 2022

0.6.0rc0 pre-release

Aug 18, 2022

0.5.2rc0 pre-release

Aug 18, 2022

0.5.1rc0 pre-release

Aug 14, 2022

0.5.0rc0 pre-release

Aug 14, 2022

0.3.1rc0 pre-release yanked

May 25, 2022

0.3.0

May 26, 2022

0.0.2

Mar 2, 2022

0.0.1

Mar 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

covalent-slurm-plugin-0.18.0.tar.gz (20.5 kB view hashes)

Uploaded Jan 26, 2024 Source

Hashes for covalent-slurm-plugin-0.18.0.tar.gz

Hashes for covalent-slurm-plugin-0.18.0.tar.gz
Algorithm	Hash digest
SHA256	`f7fd2e35f909caa023e652b163a631fb9351b1220e833446b64a9aebe2c53ee4`
MD5	`61c2fe79dbd84f7271ee3e3c451a542c`
BLAKE2b-256	`6dc56a1f4d56a685beab4fa6c2c7f0d55343d73e2921b251b6bebfff3327fecb`