Skip to main content

Run a series of docker/podman containers, in a coordinated manner

Project description

copili - container pipeline

Run a series of containers, in a coordinated manner

Maintainer: tim.bleimehl@dzd-ev.de

Licence: MIT

issue tracker: https://git.connect.dzd-ev.de/dzdtools/pythonmodules/-/issues?label_name%5B%5D=copili

HINT: This Readme is WIP. Expect changes and additions!

[[TOC]]

What?

copili is a python tool to run a series of scripts that are wrapped into a docker container/image.

You can create pipelines based on containers with central defintions. The pipeline definition supports yaml,json, python-dict.

copili will manage the runs of docker containers;

  • manage dependencies
  • handle failed runs
  • manage periodic runs
  • manage log(-files)

Example Scenario & Background

copili was created for developing a dataloading pipeline for the Covid*Graph, a Covid19 knowledge graph around a Neo4j database.

In Covid*Graph we have contributions, from many developers in diverse programming languages, to load data into the database; So called dataloaders.

To reproducable bootstrap the graph and create the needed environment for each dataloader we put all the dataloader scripts into docker images.

At the beginning we started the containers sequentially, but with a growing count of dataloaders and more complex dependencies among those dataloaders, a manual execution was not feasible anymore.

Here comes copili into the game:

With copili we can define a sequence of containers and the dependencies among them.

If we now want to rebuild the graph from scratch, we just need to start copili with our pipeline definition, which lives in a yaml file.

Now everybody can easily get an overview how the graph is created or create a local copy of the graph. Which is important for is as an open source community project.

Also we can now add new dataloaders with no effort.

On top we can create "service" definitions which automatically update our knowledge graph. More on that in the docs...

Usage

Install

Stable

BRANCH: master

pip3 install git+https://git.connect.dzd-ev.de/dzdpythonmodules/copili.git

Dev

BRANCH: dict2graph-dev

inactive atm! - pip3 install git+https://git.connect.dzd-ev.de/dzdpythonmodules/copili.git@dev - inactive atm!

Get started

Quick example

See this short example to get an example how copili works. In the following more detailed explenations will be provided.

import docker
import schedule
from copili import Pipeline


d = docker.DockerClient(base_url="unix://var/run/docker.sock")


pipeline_description = """
ExmaplePipeline:
    - name: dataloader_02
      image_repo: stakater/exit-container
      dependencies: 
        - dataloader_01
      env_vars: 
        EXIT_CODE: 0
    - name: dataloader_01
      image_repo: stakater/exit-container
    - name: dataloader_03
      image_repo: stakater/exit-container
      dependencies: 
        - dataloader_02
        - dataloader_01
    - name: servicecontainer01
      image_repo: hello-world
      is_service_container: true
      dependencies: 
        - dataloader_02
"""
# pipelindata - this could be also a path to a yaml-,json-file or just a python dict

p = Pipeline(description=pipeline_description, docker_client=d)
# run all containers once
p.run()

# Optional define custom service schedule (https://schedule.readthedocs.io)
# default is once a day at 00:00
p.service_schedule = schedule.every(10).minutes.do(p.run_service_containers)

# Step into service mode
p.start_service_mode()

# now servicecontainer01 will run every 10 minutes

Pipeline description format

A pipeline defintion consist of a name and an array of container descriptions. These container descriptions can have dependencies among each other. Container descriptions can be provided as python dict or as a json/yaml string or file.

A pipeline description will be overhanded to copili via the copili.Pipeline - description parameter

e.g.

import copili

p = Pipeline(description="path/to/my/pipelinefile.json")

Container description properties

A container description can have following properties

name

Name of the container description. Serves as identifier within copili.

Mandatory Type
(python/json/yaml)
Default Example Value(s)
True string None MY_FIRST_PIPELINE_CONTAINER

info_link

Link to the code repository or some other info about the pipeline member

Mandatory Type
(python/json/yaml)
Default Example Value(s)
True string None https://github.com/me/myrepo

desc

Short deescription of the pipeline member

Mandatory Type
(python/json/yaml)
Default Example Value(s)
True string None Loads stuff into the database

image_repo

Name of the repo where copili can download the image from. Usually a dockerhub repo. Custom repos are supported

Mandatory Type
(python/json/yaml)
Default Example Value(s)
True string None my-docker-namespace/my-container, my-own-registry.com:443/my-own-namespace/my-container

image_reg_username

If we need to authorize to download the image from a certain registry, we can pass a username here (SECURITY HINT: Environment variables are supported as well and should be used here)

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False string None my-username, ${USERNAME-FROM-DOT-ENV_FILE}

image_reg_password

If we need to authorize to download the image from a certain registry, we can pass a password here (SECURITY HINT: Environment variables are supported as well and should be used here)

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False string None my-password, $PASSWORD-FROM-SYSTEM-ENV-VAR

tag

The tag of the image

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False string latest stable, beta01, yetanothertag

is_service_container

Does the container run once per pipeline run or should it run periodically (if the pipeline enters service mode). Ssetyped for more details

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False bool False True

env_vars

Provide custom environment variables per container

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False dict/json-object/record {} {'MY_ENV_VAR':'value01',MY_OTHER_ENV_VAR:'val02'}

dependencies

Provide a list of copili container description **name*s which need to run successfull before this container is allowd to run

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False list of strings [] ['NAME_OF_OTHER_CONTAINER','NAME_OF_ANOTHER_CONTAINER']

exlude_in_env

Skip this container if we run in a certain environment. Set environment variable ENV to set the environment

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False list of strings [] ['PROD','QA']

volumes

A volumes desc. The format is given by the python-docker-sdk. See volumes-parameter

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False dict/json-object/record {} {"/tmp/data": {"bind": "/data/", "mode": "rw"}, {'/home/user1/': {'bind': '/mnt/vol2', 'mode': 'rw'},'/var/www': {'bind': '/mnt/vol1', 'mode': 'ro'}}

command

Docker command list. Similar to docker compose command

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False list of strings [] ['-p' ,'3000']

sidecars

Start helper containers with your container. E.g. if your container needs a redis database for caching

Mandatory Type
(python/json/yaml)
Default Example Value(s)
False list of container descriptions [] [{"name": "redis01", "image_repo": "redis"}]

json-Pipeline Description

To provide a pipeline description via json, provide a json object starting with a name and the list of container descriptions

{
   "my-pipeline-name":[
      {
         "name":"my-first-container",
         "repo":"hello-world"
      }
   ]
}

This will run the container hello-world once, when the pipeline is started.

Now, lets add another dependecy that is only allowed to run, if our hello world container ran successfully:

{
   "my-pipeline-name":[
      {
         "name":"my-first-container",
         "repo":"hello-world"
      },
      {
         "name":"my-second-container",
         "repo":"chentex/random-logger",
         "dependency":[
            "my-first-container"
         ]
      }
   ]
}

This again will run our hello-world container and after that the chentex/random-logger container.

It should be noted, the order of the container desciptions in the list does not matter for the dependencies. copili figures our the needed sequence itself.

Now, lets add a sidecar container to our second container

{
   "my-pipeline-name":[
      {
         "name":"my-first-container",
         "repo":"hello-world"
      },
      {
         "name":"my-second-container",
         "repo":"chentex/random-logger",
         "dependency":[
            "my-first-container"
         ],
         "sidecars":[
          {
             "name": "redis01",
             "repo": "redis"
          }
         ]
      }
   ]
}

This again will run our hello-world container and after that the chentex/random-logger container. But additionally with the second container a redis container will be started. This can be helpful for containers that need this as a caching database for example.

yaml-Pipeline Description

Same rules apply for yaml pipeline descriptions as for json.

Json follows the same structure as yaml and is just another way of formating the same informations. see https://www.json2yaml.com/

Also have a look at the quick start example, which is provided in yaml format

Container description types

via the property is_service_container we can define if a container is static or service container.

  • static

    A static container will run only once when pipeline is started. If you want to run the container only once on first pipeline run you have to set copili.Pipeline.container_did_run_check_override_callback and provide the information if a container already ran (e.g. from a database)

  • service

    Container will run periodically

Environment Variable Support

You can use (environment variables)[https://en.wikipedia.org/wiki/Environment_variable] in the pipeline description.

Either just by setting system env vars (e.g. EXPORT MYPASSWORD=hello123) or by passing a .env file via

Pipeline class

todo

ContainerManager class

Attributes

  • Image Instance of docker.models.images.Image. The image the container will run on

  • Container Instance of docker.models.containers.Container. The actual python representation of the docker container

  • exit_code None as long the container did exited. 0if the container run successfull. > 0 if the container failed to run

..ToBeCompleted

Callback / Function overrides

copili.Pipeline.container_pre_pull_callback(copili.ContainerManager)

Will be called before the image for the container is pulled

copili.Pipeline.container_pre_run_callback(copili.ContainerManager)

Will be called before the containers is started

copili.Pipeline.container_post_run_callback(copili.ContainerManager)

Will be called after the containers exited

copili.Pipeline.container_did_run_check_override_callback(copili.ContainerRegistryItem) -> Bool

Will be called before the container is started. if functions returns 'False' container run will be skipped

copili.Pipeline.container_dependency_check_override_callback(copili.ContainerManager, List[copili.ContainerManager]) -> Bool

Will be called before the container is started. if functions returns 'False' the current dependency branch will be stopped. Can be used for checking if all previously runned containers accomplish all dependencies.

If set to None `copili` checks the dependencies by recognizing that all containers which are in `copili.ContainerRegistryItem.dependencies` ran with exit code `0`. 

If you need a more sophisticated dependency check, use this function. (e.g. a check which takes the state of previous pipeline runs in account and these state informations are stored in an external database)

..ToBeCompleted

Developement

git clone ssh://git@git.connect.dzd-ev.de:22022/dzdpythonmodules/copili.git

pip install -e .

ToDo:

  • Custom schedules per service container
  • Alternative to an docker image a git repo with Dockerfile can be provided which will be build and run

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copili-0.9.5.tar.gz (19.1 kB view hashes)

Uploaded Source

Built Distribution

copili-0.9.5-py3-none-any.whl (13.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page