Skip to main content

A python package that enables user to build their custom singularity image on HPC cluster

Project description

Building a singular container for HPC using globus-compute

Context

  • One of the execution configurations of globus-compute requires a registered container which is spun up to execute the user function on the HPC.

  • HPCs do not run docker containers(due to security reasons as discussed here) and support only an apptainer/singularity image.

  • Installing the apptainer setup to build the singularity image locally is not a straightforward process especially on windows and mac systems as discussed in the documentation.

Using this python library the user can specify their custom image specification to build an apptainer/singularity image which would be used to in-turn to run their functions on globus-compute. The library registers the container and returns the container id which would be used by the globus-compute executor to execute the user function.

Prerequisite.

A globus-compute-endpoint setup on HPC cluster.

The following steps can be used to create an endpoint on the NCSA Delta Cluster, you can modify the configurations based on your use-case:

Note.

For the following to work we must use the globus-compute-sdk version of 2.2.0 while setting up our endpoint. It is recommended to use python3.9 for setting up the endpoint and as the client

  1. Create a conda virtual env. We have created a custom-image-builder conda env on the delta cluster as follows:
conda create --name custom-image-builder-py-3.9 python=3.9

conda activate custom-image-builder

pip install globus-compute-endpoint==2.2.0
  1. Creating a globus-compute endpoint:
globus-compute-endpoint configure custom-image-builder

Update the endpoint config at ~/.globus_compute/custom-image-builder/config.py to :

from parsl.addresses import address_by_interface
from parsl.launchers import SrunLauncher
from parsl.providers import SlurmProvider

from globus_compute_endpoint.endpoint.utils.config import Config
from globus_compute_endpoint.executors import HighThroughputExecutor


user_opts = {
    'delta': {
        'worker_init': 'conda activate custom-image-builder-py-3.9',
        'scheduler_options': '#SBATCH --account=bbmi-delta-cpu',
    }
}

config = Config(
    executors=[
        HighThroughputExecutor(
            max_workers_per_node=10,
            address=address_by_interface('hsn0'),
            scheduler_mode='soft',
            worker_mode='singularity_reuse',
            container_type='singularity',
            container_cmd_options="",
            provider=SlurmProvider(
                partition='cpu',
                launcher=SrunLauncher(),

                # string to prepend to #SBATCH blocks in the submit
                # script to the scheduler eg: '#SBATCH --constraint=knl,quad,cache'
                scheduler_options=user_opts['delta']['scheduler_options'],
                worker_init=user_opts['delta']['worker_init'],
                # Command to be run before starting a worker, such as:
                # 'module load Anaconda; source activate parsl_env'.

                # Scale between 0-1 blocks with 2 nodes per block
                nodes_per_block=1,
                init_blocks=0,
                min_blocks=0,
                max_blocks=1,

                # Hold blocks for 30 minutes
                walltime='00:30:00'
            ),
        )
    ],
)
  1. Start the endpoint and store the endpoint id to be used in the following example
globus-compute-endpoint start custom-image-builder

Example

Consider the following use-case where the user wants to execute a pandas operation on HPC using globus-compute. They need a singularity image which would be used by the globus-compute executor. The library can be leveraged as follows:

Locally you need to install the following packages, you can create a virtual env as follows:

cd example/

python3.9 -m venv venv

source venv/bin/activate

pip install globus-compute-sdk==2.2.0

pip install custom-image-builder
from custom_image_builder import build_and_register_container
from globus_compute_sdk import Client, Executor


def transform():
    import pandas as pd
    data = {'Column1': [1, 2, 3],
            'Column2': [4, 5, 6]}

    df = pd.DataFrame(data)

    return "Successfully created df"


def main():
    image_builder_endpoint = "bc106b18-c8b2-45a3-aaf0-75eebc2bef80"
    gcc_client = Client()

    container_id = build_and_register_container(gcc_client=gcc_client,
                                                endpoint_id=image_builder_endpoint,
                                                image_file_name="my-pandas-image",
                                                base_image_type="docker",
                                                base_image="python:3.8",
                                                pip_packages=["pandas"])

    print("The container id is", container_id)

    with Executor(endpoint_id=image_builder_endpoint,
                  container_id=container_id) as ex:
        fut = ex.submit(transform)

    print(fut.result())

Note.

The singularity image require globus-compute-endpoint as one of its packages in-order to run the workers as our custom singularity container, hence by default we require python as part of the image inorder to install globus-compute-endpoint.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

custom_image_builder-1.0.1.tar.gz (6.4 kB view hashes)

Uploaded Source

Built Distribution

custom_image_builder-1.0.1-py3-none-any.whl (6.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page