Skip to main content

A scheduler for resource-aware parallel computing on clusters.

Project description

Grain

A scheduler built with trio for resource-aware parallel computing on clusters.

TL;DR

Three core functions for you to run async jobs in an arbitary mix of parallel and sequential manner.

# Jobs/subtasks inside a waitgroup run parallelly
async with grain.open_waitgroup() as wg:

    # Put a job onto the waitgroup to be executed
    wg.submit(resource, fn, *args, **kwargs)

    # Put a subtask onto the waitgroup. Submit jobs / 
    # start other subtasks inside the subtask.
    wg.start_subtask(vfn, *args, **kwargs)

# Waitgroup blocks here until all of its jobs are done,
# so outside a waitgroup is essentially sequencial.

results = wg.results # sorted in the order of submission


# Execute one job sequentially
result = await grain.exec1(resource, fn, *args, **kwargs)

Entrypoint:

async def main(): # top-level subtask
    # Submit jobs / start subtasks here
grain.run_combine(main, [worker1_addr, worker2_addr, ...], resource_per_worker)
# ... Or for top-level parallelism, ...
#grain.run_combine([main1, main2, ...], ...)

Check out example for complete demos / more patterns and configuration sample.

Resource-awareness

Every job in the job queue has a resource request infomation along with the job to run. Before the executor run each job, it queries each worker for resource availability. If resource is insufficient, the job queue is suspended until completed jobs return resources. Resources can be CPU cores, virtual memory, both, (or anything user defined following interface grain.resource.Resource).

Every time a job function runs, it has access to grain.GVAR.res, a context-local variable giving the information of specific resource dedicated to the job. (e.g. if a job is submitted with CPU(3), asking for 3 cores, it might receive allocation like CPU([6,7,9]).)

Executor, Workers and communication

The top-level APIs (i.e. "combine") are built upon an executor-like backend called grain.GrainExecutor. It schedules and dispatches jobs to workers, and it maintains a single job queue and a result queue. The executor usually runs on the head node in a cluster.

Workers, one per node, simply receive async functions (i.e. jobs) from the executor and run them. Executor and workers use socket for communication, and dill serializes the functions to byte payloads.

Acknowledgement

The API of Grain is largely insipred by structured concurrency, a major design principle behind Trio, and it is specifically inspired by the API of Trio. And of course, Grain uses Trio internally.

Caveat

Relative import (import not on Python package path) should be within the job function. Global reference fails in this case.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grain-scheduler-0.12.1.tar.gz (26.5 kB view hashes)

Uploaded Source

Built Distribution

grain_scheduler-0.12.1-py3-none-any.whl (32.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page