Skip to main content

Manage a multitude of runs through hierarchical configuration

Project description

Code style: black License: MIT PyPI Stats: downloads pre-commit Python Version Package Version Package Status Tests Coverage

runcon

runcon is an MIT-licensed package that provides a Config class with a lot of functionality that helps and simplifies organizing many, differently configured runs (hence the name Run Configuration). Its main target audience are scientists and researchers who run many different experiments either in the real world or a computer-simulated environment and want to control the runs through a base configuration as well as save each run's settings in configuration files. The Config class helps creating differently configured runs through user-configurable hierarchical configuration layouts, it automatically creates paths for each run which can be used to save results, and it helps in comparing the config files of each run during the step of analyzing and comparing different runs.

This package was developed with Deep Learning experiments in mind. These usually consist of large and complex configurations and will therefore also be the basis for the upcoming examples of usage.

You can find the API reference here!

Installation

runcon is in PyPI, so it can be installed directly using:

pip install runcon

Or from GitHub:

git clone https://github.com/demmerichs/runcon.git
cd runcon
pip install .

Basic Usage

This package builts upon PyYAML as a parser for loading and saving configuration files, therefor you should adhere to the YAML-Syntax when writing your configuration.

Loading configurations

You can load from a single file:

from runcon import Config

cfg = Config.from_file("cfgs/file_example.cfg")
print(cfg, end="")

produces

_CFG_ID: 1d4d313eedb05ae00c98ac8cb0a34946

top_level:
  more_levels:
    deep_level_list:
    - list_value
    - null
    - 3+4j
    - 3.14
    - true

Or you can load from a directory, in which case the filenames will become the toplevel keys. The following layout

cfgs
├── dir_example
│   ├── forest.cfg
│   └── garden.cfg

with the following code

cfg = Config.from_dir("cfgs/dir_example", file_ending=".cfg")
print(cfg, end="")

produces

_CFG_ID: 705951e95af9b1f6cf314e0f96835349

forest:
  trees: 1000
  animals: 20

garden:
  trees: 2
  animals: 0

Another way to load multiple configuration files at once is by specifying all the files and their corresponding keys manually.

key_file_dict = {
    "black_forest": "cfgs/dir_example/forest.cfg",
    "random_values": "cfgs/file_example.cfg",
}
cfg = Config.from_key_file_dict(key_file_dict)
print(cfg, end="")

produces

_CFG_ID: 60b454fb7619eb972cec13e99ff6addf

black_forest:
  trees: 1000
  animals: 20

random_values:
  top_level:
    more_levels:
      deep_level_list:
      - list_value
      - null
      - 3+4j
      - 3.14
      - true

Accessing configuration values

The Config object inherets AttrDict (a support class by runcon). Therefore, values can either be accessed the same way as in a dict, or via attribute-access. Additionally, Config supports access via string-concatenation of the keys using a dot as delimiter, e.g.

>>> from runcon import Config
>>> cfg = Config({
...     "top": {
...         "middle": {"bottom": 3.14},
...         "cfg": "value",
...     }
... })
>>> print(cfg.top.middle["bottom"])
3.14
>>> print(cfg["top"].cfg)
value
>>> print(cfg["top.middle.bottom"])
3.14

Creating runs

Most projects managing multiple runs do this by manually labeling different configuration setups for each run. The main drawbacks of this approach for a larger set of runs are:

  • non-deterministic: Different people might label the same configuration differently or different configurations the same way. Even the same person might not remember after a week which settings exactly were changed based on their labeling.
  • non-descriptive: In complex configurations a short label cannot capture all setting changes. Finding these via a diff-view can become daunting and unstructured, making it complicated to easliy grasp all the changes made.

Together with this package we propose an alternate way of structuring runs and configurations and trading of slightly longer "labels" for the removal of the above drawbacks.

Most projects start with a single default configuration, and going from there apply one or more change of settings to produce differently configured runs. We suggest moving all this information into one or multiple configuration files, e.g. a single default configuration, and multiple named setting changes:

# dl_example.cfg
default:
  model:
    name: ResNet
    layers: 50
  batchsize: 16
  optimizer:
    name: Adam
    learningrate: 1e-3
  loss: MSE

small_net:
  model:
    layers: 5

large_net:
  model:
    layers: 100

alex:
  model:
    name: AlexNet
  optimizer:
    name: SGD

large_bs:
  batchsize: 64
  optimizer:
    learningrate: 4e-3

You could now create in code your run configuration like this (but not miss the shortcut after this example):

from copy import deepcopy

base_cfgs = Config.from_file("cfgs/dl_example.cfg")
cfg = deepcopy(base_cfgs.default)
# rupdate works similar to dict.update, but recursivly updates lower layers
cfg.rupdate(base_cfgs.large_net)
cfg.rupdate(base_cfgs.alex)
cfg.loss = "SmoothL1"
cfg.optimizer.learningrate = 1e-4
print(cfg, end="")

produces

_CFG_ID: be99468b9911c12ccba140ae5d9f487a

model:
  name: AlexNet
  layers: 100

batchsize: 16

optimizer:
  name: SGD
  learningrate: 0.0001
  weightdecay: 1.0e-06

loss: SmoothL1

As this pattern of stacking/merging configurations and possibly modifying a few single values is very common or at least the intended way for using this package, there is a simple shortcut function which operates on string input such that a CLI parser can easily pass values to this function.

For example, you might want to run a script specifying the above constructed configuration like this:

python your_runner_script.py \
    --cfg default large_net alex \
    --set \
        loss SmoothL1 \
        optimizer.learningrate 1e-4

The details of how your CLI interface should look and how you want to parse the values is left to you, (e.g. you could leave out default if you have only a single default configuration and just add it inside your code after CLI invocation), but parsing the above command options into the following all-strings variables

cfg_chain = ["default", "large_net", "alex"]
set_values = [
    "loss", "SmoothL1",
    "optimizer.learningrate", "1e-4",
]

would allow you to call

base_cfgs = Config.from_file("cfgs/dl_example.cfg")
cfg = base_cfgs.create(cfg_chain, kv=set_values)
print(cfg, end="")

and produces (using internally ast.literal_eval to parse non-string values, like booleans or floats, in this example 1e-4)

_CFG_ID: be99468b9911c12ccba140ae5d9f487a

model:
  name: AlexNet
  layers: 100

batchsize: 16

optimizer:
  name: SGD
  learningrate: 0.0001
  weightdecay: 1.0e-06

loss: SmoothL1

The resulting label for this configuration would then consist of the configuration chain and the single key-value pairs, and can be automatically reconstructed from the base configs, e.g.

print(cfg.create_auto_label(base_cfgs))

produces

default alex large_net -s optimizer.learningrate 0.0001 loss SmoothL1

Given the run configuration and the set of base configurations, this label can always deterministically be created, and making it shorter is just a matter of wrapping more key-value pairs or base configs into meta configurations.

For the above example this could mean just adding a smoothl1 sub config which also changes the learning rate, e.g.

base_cfgs.smoothl1 = Config({"loss": "SmoothL1", "optimizer": {"learningrate": 0.0001}})
print(cfg.create_auto_label(base_cfgs))

produces

default smoothl1 alex large_net

This approach mitigates both drawbacks mentioned earlier. The labels are deterministic, and based on the labels, it is quite easy to read of the changes made to the default configuration, as the label itself describes hierarchical changes and the base configurations modifying the default configuration are considered to be minimalistic.

Organizing runs

After creating your run configuration in your script, it is time to create a directory for your new run, and using it to dump your results from that run.

cfg_dir = cfg.initialize_cfg_path(base_path="/tmp/Config_test", timestamp=False)
print(type(cfg_dir), cfg_dir)

produces

<class 'pathlib.PosixPath'> /tmp/Config_test/8614010d20024c05f815cc8edcc8982f

The path mainly consists of two parts, a time stamp allowing you to store multiple runs with the same configuration (if you specify timestampe=True), and a hash produced by the configuration. Assuming hash collisions are too rare to be ever a problem, two configurations that differ somehow, will always produce different hashes. The hash is used, as it only depends on the configuration, whereas the automatic labeling depends also on the base configuration. The previous section demonstrated, how a change in the base configurations can produce a change in the automatic label. The initialize_cfg_path routine also produces a description folder next to the configuration folders, where symlinks are stored to the configuration folders, but with the automatic labels. This ensures, that the symlinks can easily be recreated based on a changed configuration, without the need to touch the actual run directories.

Another thing that happens during the path initialization is a call to cfg.finalize(). This should mimic the behavior of making all values constant and ensures that the configuration file that was created on disk actually represents all values used during the run execution, and accidental in-place value changes can be mostly ruled out.

try:
  cfg.loss = "new loss"
except ValueError as e:
  print(e)
print(cfg.loss)
cfg.unfinalize()
cfg.loss = "new loss"
print(cfg.loss)

produces

This Config was already finalized! Setting attribute or item with name loss to value new loss failed!
SmoothL1
new loss

License

runcon is released under a MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

runcon-1.1.11.tar.gz (26.6 kB view hashes)

Uploaded Source

Built Distribution

runcon-1.1.11-py3-none-any.whl (19.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page