Skip to main content

Multi-objective optimization of chemical processes with automated machine learning workflows

Project description

Nomadic Exploratory Multi-objective Optimisation (NEMO)

Meet NEMO - our ‘Nomadic Explorer’. NEMO is quite the connoisseur when it comes to machine learning optimisation - only the best model types and model parameters will suffice. In the case of a single dataset, NEMO will scour the lands for the optimal model type and parameters to fit a given dataset. A range of outputs will then be generated for you to assess, interpret and utilise your newly created model.

If you decide to take your analyses a step further into the realms of Multi-objective Bayesian optimisation, then our Nomadic Explorer will tirelessly search for the best model type and parameters at each iteration of the optimisation. At each stage, the optimal set of conditions will be provided to aid your pursuit of the elusive multi-dimensional pareto front.

NEMO is prepared for the journey with a cavernous bag of tools. However, if your aspirations are more exotic, then NEMO supports the inclusion of custom models, samplers and functions.

Check out the examples to see NEMO in action and get started with your own ML workflows.

What is NEMO?

NEMO is a package designed for Bayesian optimisation of one or multiple objectives simultaneously, with a focus on applying to chemical processes.

Installation

To install NEMO via pip:

pip install nemo-bo

How does NEMO work?

Firstly, the parameters (variables) and targets (objectives) of a chemical process are provided to the algorithm. After providing NEMO with a dataset from prior experiments, it will then identify the relationship between the parameters and targets and then suggest the ideal parameters to use for the optimisation iteration.

In comparison to other open-source optimisation libraries, NEMO will automatically optimise the hyperparameters for various machine learning models and select the one with the best predictive accuracy for a given objective. This ensures that the model is continuously optimised over the course of an optimisation campaign. Furthermore, NEMO natively supports objectives that can be calculated if the exact relationship between the parameters and the target (e.g. materials cost) is known.

What features are in NEMO?

Although NEMO includes many machine learning models, acquisition functions, constraints, and sample generators, the base classes for these are all included and can be utilised as a template for adding your own custom solutions.

The features natively found in NEMO are the following:

Resuming an optimisation

Every iteration, the progress of optimisation runs and ML model information are also saved at two points:

  1. Firstly, when candidates have been suggested
  2. And secondly, after the new results have been inputted, at the end of the iteration

This allows users to resume optimisation runs from two convenient positions

Machine learning models available

  1. Gaussian processes (GPs) using the BoTorch library

  2. Various neural networks from the Deeply Uncertain code repository:

    1. Bayesian neural networks
    2. Concrete dropout
    3. Deep ensembles
  3. Various decision-tree based models:

    1. XGBoost Distribution
    2. NGBoost
    3. Random Forest using the forest-confidence-interval

Variable types available

  1. Continuous variables (ContinuousVariable)
  2. Categorical variables with discrete variables (CategoricalVariableDiscreteValues)
  3. Categorical variables with descriptors (CategoricalVariableWithDescriptors)

Categorical variables without any description (e.g. one-hot encoding) is not currently supported

Objective types available

  1. Objectives modelled using machine learning models (RegressionObjective)
  2. Calculated objectives using a user-provided function (CalculableObjective)

Classification objectives are not currently supported

User-selectable acquisition functions available

  1. Expected improvement based methods (ExpectedImprovement)

    1. A modifed single-objective expected improvement algorithm that is better at exploration than the standard analytical method
    2. A modifed multi-objective expected hypervolume improvement algorithm that is better at exploration than the standard analytical method
    3. qNEI and qNEHVI BoTorch methods (only compatible with GP models)
  2. A Unified evolutionary optimization algorithm U-NSGA-III based method that derives uncertainty in the inference by sampling from a distribution (NSGAImprovement)

  3. A fully explorative method that identifies the candidates that have the highest uncertainty in the objective predictions (HighestUncertainty)

Input constraints available

  1. Linear equality and inequality constraints(LinearConstraint)
  2. Basic non-linear equality and inequality constraints that incorporates an exponent for each input variable (NonLinearPowerConstraint)
  3. Equality and inequality constraints that allows the user to pass a function to calculate the left-hand-side of the constraint (FunctionalConstraint)
  4. Stoichiometry constraints that forces the ratio between two input variable to be equal to or greater than a specified value (StoichiometricConstraint)
  5. A constraint type to limit the number of active variables (MaxActiveFeaturesConstraint)
  6. A constraint type that prevents certain categorical constraints from being selected simulatenously (CategoricalConstraint)

Benchmarking functionality available

Benchmark functions are typically used to simulate the outcomes of experiments in a closed-loop manner, and therefore the user is not promoted to input the actual output values of suggested candidates. Therefore, they can be helpful to evaluate the quality of an optimisation (inferred from the effectiveness of the utilised model(s) and/or acquisition function to identify the optimum)

  1. Machine learning model based on a provided dataset (ModelBenchmark)
  2. Single objective synthetic functions (SingleObjectiveSyntheticBenchmark)
  3. Multi-objective synthetic functions (MultiObjectiveSyntheticBenchmark)

Sample generators available

Methods for generating a samples of parameter values during an optimisation. These can be used independently outside of an optimisation too by calling the generate_samples function

  1. Latin hypercube sampling (with a mixed-integer implementation for efficient sampling of categorical variables) (LatinHyperCubeSampling)
  2. Sobol sampling (SobolSampling)
  3. Polytope sampling (PolytopeSampling)
  4. Random sampling (RandomSampling)
  5. Pool-based sampling using a user-defined set of data points. Typically used as an alternative to a machine learning model benchmark function (PoolBased)

Other utilities/functions available

  1. Included template for provided the dataset with automated extraction
  2. Scatter and bar chart plotting functionality for displaying model quality and optimisation progress

Getting started

The following code demonstrates how to set-up a simple bayesian optimisation using a user-provided dataset containing four continuous variables (X) and two objectives (Y):

# Import the variable, objectives, sampler, acquisition function, and the optimisation classes
import numpy as np
from nemo_bo.opt.variables import ContinuousVariable, VariablesList
from nemo_bo.opt.objectives import RegressionObjective, ObjectivesList
from nemo_bo.acquisition_functions.expected_improvement.expected_improvement import ExpectedImprovement
from nemo_bo.opt.samplers import LatinHyperCubeSampling
from nemo_bo.opt.optimisation import Optimisation

# Create the variable objects
var1 = ContinuousVariable(name="variable1", lower_bound=1.0, upper_bound=10.0)
var2 = ContinuousVariable(name="variable2", lower_bound=0.02, upper_bound=0.2)
var3 = ContinuousVariable(name="variable3", lower_bound=30.0, upper_bound=70.0)
var4 = ContinuousVariable(name="variable4", lower_bound=5.0, upper_bound=15.0)
var_list = VariablesList([var1, var2, var3, var4])

# Create the objective objects
obj1 = RegressionObjective(
    name="objective1", # obj_max_bool when True defines the objective is to be maximised
    obj_max_bool=True,
    lower_bound=0.0,
    upper_bound=100.0,
    predictor_type=["gp", "xgb"],
)
obj2 = RegressionObjective(
    name="objective2",
    obj_max_bool=False, # obj_max_bool when False defines the objective is to be minimised
    lower_bound=0.01,
    upper_bound=0.15,
    predictor_type=["gp", "xgb"],
)
obj_list = ObjectivesList([obj1, obj2])

# Instantiate the sampler
sampler = LatinHyperCubeSampling()

# Instantiate the acquisition function
acq_func = ExpectedImprovement(num_candidates=4) # num_candidates defines how many sets of parameters to return at each optimisation iteration

# Set up the optimisation instance
# opt_name is used to store the optimisation information in a sub-folder with this name
optimisation = Optimisation(var_list, obj_list, acq_func, sampler=sampler, opt_name="README optimisation")

# Start the optimisation using the convenient run function that will run for the specified number of iterations
# X and Y arrays represent an initial user-provided dataset
X = np.array(
    [
        [6.82, 0.16, 34, 6.2],
        [6.15, 0.08, 47, 8.5],
        [4.92, 0.05, 32, 11.1],
        [9.24, 0.15, 41, 12.1],
        [1.07, 0.12, 67, 8.2],
        [5.66, 0.09, 53, 12.7],
        [8.08, 0.19, 54, 5.4],
        [1.87, 0.11, 68, 9.2],
        [4.08, 0.13, 58, 10.4],
        [4.38, 0.18, 36, 14.6],
    ]
)
Y = np.array(
    [
        [33.31, 0.12],
        [41.89, 0.10],
        [36.87, 0.09],
        [46.32, 0.13],
        [0.00, 0.09],
        [36.52, 0.10],
        [45.77, 0.14],
        [0.00, 0.09],
        [30.95, 0.11],
        [34.89, 0.12],
    ]
)
optimisation_data = optimisation.run(X, Y, number_of_iterations=10)

# During the optimisation, after candidates have been suggested, the user will be prompted to input the actual output 
# values into the python console. At this point, the model information, optimisation progress, and candidates have been 
# saved and the user can either choose to leave the python console open whilst they obtain the results, or they can 
# stop the python process, and then resume the optimisation and input the values at a more convenient time later

# After the actual output values have been inputted, the optimisation run will be saved again, and then the next
# iteration starts automatically

More tutorials

We encourage you to look through the tutorials written in the tutorials folder to see how to use some other NEMO functions

  1. How to select specific machine learning models types for the objectives
  2. Setting up a single objective optimisation
  3. How to use calculable objectives
  4. How to define transformers for variables and objectives
  5. How to define categorical variables with descriptors
  6. Utilising the machine learning model fitting in NEMO without Bayesian optimisation
  7. How to create a closed-loop optimisation using a machine learning model as the benchmark function
  8. How to create a closed-loop optimisation using a multiobjective synthetic function as the benchmark function
  9. How to create a closed-loop optimisation using a single objective synthetic function as the benchmark function
  10. How to create a closed-loop optimisation using a pool-based sampler as the benchmark
  11. Setting up an optimisation with input constraints
  12. Generating samples without needing to perform an optimisation
  13. How to set up a manual optimisation
  14. How to resume an optimisation run
  15. How to use the BoTorch (quasi-) Monte-Carlo based acquisition functions in NEMO
  16. How to set up an optimisation that uses U-NSGA-III as the acquisition function
  17. Using the input template excel file template to import the variables and objectives data
  18. How to set up an optimisation that uses the highest uncertainty acquisition function

What to do if you find any issues?

Leave a message in the issues section and we will get back to you as soon as we can.

Acknowledgements

Much of the functionality in NEMO is built on top of the work by the authors of the features we incorporate. We are grateful to them for continuously supporting their libraries and establishing their platforms for optimisation work. We reference the works throughout the .py files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nemo_bo-0.1.16.tar.gz (96.5 kB view hashes)

Uploaded Source

Built Distribution

nemo_bo-0.1.16-py3-none-any.whl (132.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page