Multi-objective optimization of chemical processes with automated machine learning workflows

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Nomadic Exploratory Multi-objective Optimisation (NEMO)

Meet NEMO - our ‘Nomadic Explorer’. NEMO is quite the connoisseur when it comes to machine learning optimisation - only the best model types and model parameters will suffice. In the case of a single dataset, NEMO will scour the lands for the optimal model type and parameters to fit a given dataset. A range of outputs will then be generated for you to assess, interpret and utilise your newly created model.

If you decide to take your analyses a step further into the realms of Multi-objective Bayesian optimisation, then our Nomadic Explorer will tirelessly search for the best model type and parameters at each iteration of the optimisation. At each stage, the optimal set of conditions will be provided to aid your pursuit of the elusive multi-dimensional pareto front.

NEMO is prepared for the journey with a cavernous bag of tools. However, if your aspirations are more exotic, then NEMO supports the inclusion of custom models, samplers and functions.

Check out the examples to see NEMO in action and get started with your own ML workflows.

What is NEMO?

NEMO is a package designed for Bayesian optimisation of one or multiple objectives simultaneously, with a focus on applying to chemical processes.

Installation

To install NEMO via pip:

pip install nemo-bo

How does NEMO work?

Firstly, the parameters (variables) and targets (objectives) of a chemical process are provided to the algorithm. After providing NEMO with a dataset from prior experiments, it will then identify the relationship between the parameters and targets and then suggest the ideal parameters to use for the optimisation iteration.

In comparison to other open-source optimisation libraries, NEMO will automatically optimise the hyperparameters for various machine learning models and select the one with the best predictive accuracy for a given objective. This ensures that the model is continuously optimised over the course of an optimisation campaign. Furthermore, NEMO natively supports objectives that can be calculated if the exact relationship between the parameters and the target (e.g. materials cost) is known.

What features are in NEMO?

Although NEMO includes many machine learning models, acquisition functions, constraints, and sample generators, the base classes for these are all included and can be utilised as a template for adding your own custom solutions.

The features natively found in NEMO are the following:

Resuming an optimisation

Every iteration, the progress of optimisation runs and ML model information are also saved at two points:

Firstly, when candidates have been suggested
And secondly, after the new results have been inputted, at the end of the iteration

This allows users to resume optimisation runs from two convenient positions

Machine learning models available

Gaussian processes (GPs) using the BoTorch library
Various neural networks from the Deeply Uncertain code repository:
1. Bayesian neural networks
2. Concrete dropout
3. Deep ensembles
Various decision-tree based models:
1. XGBoost Distribution
2. NGBoost
3. Random Forest using the forest-confidence-interval

Variable types available

Continuous variables (ContinuousVariable)
Categorical variables with discrete variables (CategoricalVariableDiscreteValues)
Categorical variables with descriptors (CategoricalVariableWithDescriptors)

Categorical variables without any description (e.g. one-hot encoding) is not currently supported

Objective types available

Objectives modelled using machine learning models (RegressionObjective)
Calculated objectives using a user-provided function (CalculableObjective)

Classification objectives are not currently supported

User-selectable acquisition functions available

Expected improvement based methods (ExpectedImprovement)
1. A modifed single-objective expected improvement algorithm that is better at exploration than the standard analytical method
2. A modifed multi-objective expected hypervolume improvement algorithm that is better at exploration than the standard analytical method
3. qNEI and qNEHVI BoTorch methods (only compatible with GP models)
A Unified evolutionary optimization algorithm U-NSGA-III based method that derives uncertainty in the inference by sampling from a distribution (NSGAImprovement)
A fully explorative method that identifies the candidates that have the highest uncertainty in the objective predictions (HighestUncertainty)

Input constraints available

Linear equality and inequality constraints(LinearConstraint)
Basic non-linear equality and inequality constraints that incorporates an exponent for each input variable (NonLinearPowerConstraint)
Equality and inequality constraints that allows the user to pass a function to calculate the left-hand-side of the constraint (FunctionalConstraint)
Stoichiometry constraints that forces the ratio between two input variable to be equal to or greater than a specified value (StoichiometricConstraint)
A constraint type to limit the number of active variables (MaxActiveFeaturesConstraint)
A constraint type that prevents certain categorical constraints from being selected simulatenously (CategoricalConstraint)

Benchmarking functionality available

Benchmark functions are typically used to simulate the outcomes of experiments in a closed-loop manner, and therefore the user is not promoted to input the actual output values of suggested candidates. Therefore, they can be helpful to evaluate the quality of an optimisation (inferred from the effectiveness of the utilised model(s) and/or acquisition function to identify the optimum)

Machine learning model based on a provided dataset (ModelBenchmark)
Single objective synthetic functions (SingleObjectiveSyntheticBenchmark)
Multi-objective synthetic functions (MultiObjectiveSyntheticBenchmark)

Sample generators available

Methods for generating a samples of parameter values during an optimisation. These can be used independently outside of an optimisation too by calling the generate_samples function

Latin hypercube sampling (with a mixed-integer implementation for efficient sampling of categorical variables) (LatinHyperCubeSampling)
Sobol sampling (SobolSampling)
Polytope sampling (PolytopeSampling)
Random sampling (RandomSampling)
Pool-based sampling using a user-defined set of data points. Typically used as an alternative to a machine learning model benchmark function (PoolBased)

Other utilities/functions available

Included template for provided the dataset with automated extraction
Scatter and bar chart plotting functionality for displaying model quality and optimisation progress

Getting started

The following code demonstrates how to set-up a simple bayesian optimisation using a user-provided dataset containing four continuous variables (X) and two objectives (Y):

# Import the variable, objectives, sampler, acquisition function, and the optimisation classes
import numpy as np
from nemo_bo.opt.variables import ContinuousVariable, VariablesList
from nemo_bo.opt.objectives import RegressionObjective, ObjectivesList
from nemo_bo.acquisition_functions.expected_improvement.expected_improvement import ExpectedImprovement
from nemo_bo.opt.samplers import LatinHyperCubeSampling
from nemo_bo.opt.optimisation import Optimisation

# Create the variable objects
var1 = ContinuousVariable(name="variable1", lower_bound=1.0, upper_bound=10.0)
var2 = ContinuousVariable(name="variable2", lower_bound=0.02, upper_bound=0.2)
var3 = ContinuousVariable(name="variable3", lower_bound=30.0, upper_bound=70.0)
var4 = ContinuousVariable(name="variable4", lower_bound=5.0, upper_bound=15.0)
var_list = VariablesList([var1, var2, var3, var4])

# Create the objective objects
obj1 = RegressionObjective(
    name="objective1", # obj_max_bool when True defines the objective is to be maximised
    obj_max_bool=True,
    lower_bound=0.0,
    upper_bound=100.0,
    predictor_type=["gp", "xgb"],
)
obj2 = RegressionObjective(
    name="objective2",
    obj_max_bool=False, # obj_max_bool when False defines the objective is to be minimised
    lower_bound=0.01,
    upper_bound=0.15,
    predictor_type=["gp", "xgb"],
)
obj_list = ObjectivesList([obj1, obj2])

# Instantiate the sampler
sampler = LatinHyperCubeSampling()

# Instantiate the acquisition function
acq_func = ExpectedImprovement(num_candidates=4) # num_candidates defines how many sets of parameters to return at each optimisation iteration

# Set up the optimisation instance
# opt_name is used to store the optimisation information in a sub-folder with this name
optimisation = Optimisation(var_list, obj_list, acq_func, sampler=sampler, opt_name="README optimisation")

# Start the optimisation using the convenient run function that will run for the specified number of iterations
# X and Y arrays represent an initial user-provided dataset
X = np.array(
    [
        [6.82, 0.16, 34, 6.2],
        [6.15, 0.08, 47, 8.5],
        [4.92, 0.05, 32, 11.1],
        [9.24, 0.15, 41, 12.1],
        [1.07, 0.12, 67, 8.2],
        [5.66, 0.09, 53, 12.7],
        [8.08, 0.19, 54, 5.4],
        [1.87, 0.11, 68, 9.2],
        [4.08, 0.13, 58, 10.4],
        [4.38, 0.18, 36, 14.6],
    ]
)
Y = np.array(
    [
        [33.31, 0.12],
        [41.89, 0.10],
        [36.87, 0.09],
        [46.32, 0.13],
        [0.00, 0.09],
        [36.52, 0.10],
        [45.77, 0.14],
        [0.00, 0.09],
        [30.95, 0.11],
        [34.89, 0.12],
    ]
)
optimisation_data = optimisation.run(X, Y, number_of_iterations=10)

# During the optimisation, after candidates have been suggested, the user will be prompted to input the actual output 
# values into the python console. At this point, the model information, optimisation progress, and candidates have been 
# saved and the user can either choose to leave the python console open whilst they obtain the results, or they can 
# stop the python process, and then resume the optimisation and input the values at a more convenient time later

# After the actual output values have been inputted, the optimisation run will be saved again, and then the next
# iteration starts automatically

What to do if you find any issues?

Leave a message in the issues section and we will get back to you as soon as we can.

Acknowledgements

Much of the functionality in NEMO is built on top of the work by the authors of the features we incorporate. We are grateful to them for continuously supporting their libraries and establishing their platforms for optimisation work. We reference the works throughout the .py files.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.16

Apr 24, 2023

0.1.15

Apr 13, 2023

0.1.14

Oct 8, 2022

0.1.13

Oct 8, 2022

0.1.12

Oct 8, 2022

0.1.11

Oct 8, 2022

0.1.10

Oct 8, 2022

0.1.9

Oct 8, 2022

0.1.8

Aug 28, 2022

0.1.7

Jul 23, 2022

0.1.6

Jun 15, 2022

0.1.5

Jun 15, 2022

0.1.4

Jun 14, 2022

0.1.3

Jun 14, 2022

0.1.2

Jun 14, 2022

0.1.1

Jun 14, 2022

0.1.0

Jun 14, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nemo_bo-0.1.16.tar.gz (96.5 kB view hashes)

Uploaded Apr 24, 2023 Source

Built Distribution

nemo_bo-0.1.16-py3-none-any.whl (132.4 kB view hashes)

Uploaded Apr 24, 2023 Python 3

Hashes for nemo_bo-0.1.16.tar.gz

Hashes for nemo_bo-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`74a5982bf918b42b0cd6c7ba25f78a74c496f6b6d01c082008cf1601771967ac`
MD5	`7a1dfea41c0fd2c5edaa3327465475da`
BLAKE2b-256	`fa4987afb356d75d4cdc058c1d10e9f1c6ebfe0bffa74ab1eddc72830d28b24e`

Hashes for nemo_bo-0.1.16-py3-none-any.whl

Hashes for nemo_bo-0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5bdaa8c3fc4fd586557377911f5c87a07dbed90cc5b580c2754a6cef3d476f5`
MD5	`74cfe30abff49430825897253fba0cd7`
BLAKE2b-256	`474becb8bed70d0db8d5b79a1a0f5d3ccb412b4c021e7cf85f18346859490f73`