sklearn-instrumentation

scikit-learn instrumentation tooling

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Generalized instrumentation tooling for scikit-learn models. sklearn_instrumentation allows instrumenting the sklearn package and any scikit-learn compatible packages with estimators and transformers inheriting from sklearn.base.BaseEstimator.

Instrumentation applies decorators to methods of BaseEstimator-derived classes or instances. By default the instrumentor applies instrumentation to the following methods (except when they are properties of instances):

fit
fit_transform
predict
predict_log_proba
predict_proba
transform
_fit
_fit_transform
_predict
_predict_log_proba
_predict_proba
_transform

sklearn-instrumentation supports instrumentation of full sklearn-compatible packages, as well as recursive instrumentation of models (metaestimators like Pipeline, or even single estimators like RandomForestClassifier)

Installation

The sklearn-instrumentation package is available on pypi and can be installed using pip

pip install sklearn-instrumentation

Package instrumentation

Instrument any sklearn compatible package that has BaseEstimator-derived classes.

from sklearn_instrumentation import SklearnInstrumentor

instrumentor = SklearnInstrumentor(instrument=my_instrument)
instrumentor.instrument_packages(["sklearn", "xgboost", "lightgbm"])

Full example:

import logging

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklearn_instrumentation import SklearnInstrumentor
from sklearn_instrumentation.instruments.logging import TimeElapsedLogger

logging.basicConfig(level=logging.INFO)

# Create an instrumentor and instrument sklearn
instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())
instrumentor.instrument_packages(["sklearn"])

# Create a toy model for classification
ss = StandardScaler()
pca = PCA(n_components=3)
rf = RandomForestClassifier()
classification_model = Pipeline(
    steps=[
        (
            "fu",
            FeatureUnion(
                transformer_list=[
                    ("ss", ss),
                    ("pca", pca),
                ]
            ),
        ),
        ("rf", rf),
    ]
)
X, y = load_iris(return_X_y=True)

# Observe logging
classification_model.fit(X, y)
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0006406307220458984 seconds
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0001430511474609375 seconds
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0006711483001708984 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.0026731491088867188 seconds
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.1768970489501953 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.17983102798461914 seconds

# Observe logging
classification_model.predict(X)
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00024509429931640625 seconds
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.0002181529998779297 seconds
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0012080669403076172 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.013531208038330078 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.013692140579223633 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.015219926834106445 seconds

# Remove instrumentation
instrumentor.uninstrument_packages(["sklearn"])

# Observe no logging
classification_model.predict(X)

Instance instrumentation

Instrument any sklearn compatible trained estimator or metaestimator.

from sklearn_instrumentation import SklearnInstrumentor

instrumentor = SklearnInstrumentor(instrument=my_instrument)
instrumentor.instrument_instance(estimator=my_ml_pipeline)

Example:

import logging

from sklearn.datasets import load_iris
from sklearn_instrumentation import SklearnInstrumentor
from sklearn_instrumentation.instruments.logging import TimeElapsedLogger
from sklearn.ensemble import RandomForestClassifier

logging.basicConfig(level=logging.INFO)

# Train a classifier
X, y = load_iris(return_X_y=True)
rf = RandomForestClassifier()

rf.fit(X, y)

# Create an instrumentor which decorates BaseEstimator methods with
# logging output when entering and exiting methods, with time elapsed logged
# on exit.
instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())

# Apply the decorator to all BaseEstimators in each of these libraries
instrumentor.instrument_instance(rf)

# Observe the logging output
rf.predict(X)
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.014165163040161133 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.014327764511108398 seconds

# Remove the decorator from all BaseEstimators in each of these libraries
instrumentor.uninstrument_instance(rf)

# No more logging
rf.predict(X)

Instance class instrumentation

During fitting, some metaestimators will copy estimator instances using scikit-learn’s clone function. This results in cloned fitted estimators not having instrumentation. To get around this we can instrument the classes rather than the instances.

from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import FeatureUnion
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from sklearn_instrumentation import SklearnInstrumentor
from sklearn_instrumentation.instruments.logging import TimeElapsedLogger

logging.basicConfig(level=logging.INFO)

ss = StandardScaler()
pca = PCA(n_components=3)
rf = RandomForestClassifier()
classification_model = Pipeline(
    steps=[
        (
            "fu",
            FeatureUnion(
                transformer_list=[
                    ("ss", ss),
                    ("pca", pca),
                ]
            ),
        ),
        ("rf", rf),
    ]
)
X, y = load_iris(return_X_y=True)

instrumentor = SklearnInstrumentor(instrument=TimeElapsedLogger())
instrumentor.instrument_instance_classes(classification_model)

classification_model.fit(X, y)
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0006749629974365234 seconds
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.fit elapsed time: 0.0007731914520263672 seconds
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00016427040100097656 seconds
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0002810955047607422 seconds
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit starting.
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0004239082336425781 seconds
# INFO:sklearn_instrumentation.instruments.logging:PCA._fit elapsed time: 0.0005612373352050781 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.002705097198486328 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline._fit elapsed time: 0.002802133560180664 seconds
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit starting.
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.16085195541381836 seconds
# INFO:sklearn_instrumentation.instruments.logging:BaseForest.fit elapsed time: 0.16097569465637207 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.1639721393585205 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.fit elapsed time: 0.16404390335083008 seconds
classification_model.predict(X)
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.0001049041748046875 seconds
# INFO:sklearn_instrumentation.instruments.logging:StandardScaler.transform elapsed time: 0.00017309188842773438 seconds
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform starting.
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.0001690387725830078 seconds
# INFO:sklearn_instrumentation.instruments.logging:_BasePCA.transform elapsed time: 0.00023698806762695312 seconds
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0008630752563476562 seconds
# INFO:sklearn_instrumentation.instruments.logging:FeatureUnion.transform elapsed time: 0.0009222030639648438 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba starting.
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.01138925552368164 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict_proba elapsed time: 0.011497974395751953 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.011577844619750977 seconds
# INFO:sklearn_instrumentation.instruments.logging:ForestClassifier.predict elapsed time: 0.011635780334472656 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.012682199478149414 seconds
# INFO:sklearn_instrumentation.instruments.logging:Pipeline.predict elapsed time: 0.012733936309814453 seconds

instrumentor.uninstrument_instance_classes(classification_model)

classification_model.predict(X)

Instruments

The package comes with a handful of instruments which log information about X or timing of execution. You can create your own instrument just by creating a decorator, following this pattern

from functools import wraps


def my_instrumentation(estimator, func, **dkwargs):
    """Wrap an estimator method with instrumentation.

    :param obj: The class or instance on which to apply instrumentation
    :param func: The method to be instrumented.
    :param dkwargs: Decorator kwargs, which can be passed to the
        decorator at decoration time. For estimator instrumentation
        this allows different parametrizations for each ml model.
    """
    @wraps(func)
    def wrapper(*args, **kwargs):
        """Wrapping function.

        :param args: The args passed to methods, typically
            just ``X`` and/or ``y``
        :param kwargs: The kwargs passed to methods, usually
            weights or other params
        """
        # Code goes here before execution of the estimator method
        retval = func(*args, **kwargs)
        # Code goes here after execution of the estimator method
        return retval

    return wrapper

To create a stateful instrument, use a class with the __call__ method for implementing the decorator:

from functools import wraps

from sklearn_instrumentation.instruments.base import BaseInstrument


class MyInstrument(BaseInstrument)

    def __init__(self, *args, **kwargs):
        # handle any statefulness here
        pass

    def __call__(self, estimator, func, **dkwargs):
        """Wrap an estimator method with instrumentation.

        :param obj: The class or instance on which to apply instrumentation
        :param func: The method to be instrumented.
        :param dkwargs: Decorator kwargs, which can be passed to the
            decorator at decoration time. For estimator instrumentation
            this allows different parametrizations for each ml model.
        """
        @wraps(func)
        def wrapper(*args, **kwargs):
            """Wrapping function.

            :param args: The args passed to methods, typically
                just ``X`` and/or ``y``
            :param kwargs: The kwargs passed to methods, usually
                weights or other params
            """
            # Code goes here before execution of the estimator method
            retval = func(*args, **kwargs)
            # Code goes here after execution of the estimator method
            return retval

        return wrapper

To pass kwargs for different ml models:

instrumentor = SklearnInstrumentor(instrument=my_instrument)

instrumentor.instrument_instance(estimator=ml_model_1, instrument_kwargs={"name": "awesome_model"})
instrumentor.instrument_instance(estimator=ml_model_2, instrument_kwargs={"name": "better_model"})

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.13.0

Jul 21, 2022

0.12.0

Jun 6, 2022

0.11.0

May 23, 2022

0.10.0

Mar 23, 2022

0.9.0

Mar 23, 2022

0.8.0

Mar 22, 2022

0.7.0

May 11, 2021

0.6.1

Apr 18, 2021

0.6.0

Apr 18, 2021

0.5.0

Mar 24, 2021

0.4.1

Dec 14, 2020

0.4.0

Dec 6, 2020

0.3.0

Nov 20, 2020

0.2.0

Nov 13, 2020

0.1.1

Nov 12, 2020

0.1.0

Nov 12, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sklearn-instrumentation-0.13.0.tar.gz (21.7 kB view hashes)

Uploaded Jul 21, 2022 Source

Built Distribution

sklearn_instrumentation-0.13.0-py3-none-any.whl (27.4 kB view hashes)

Uploaded Jul 21, 2022 Python 3

Hashes for sklearn-instrumentation-0.13.0.tar.gz

Hashes for sklearn-instrumentation-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`7fddb460b68a6c0eea05fcbe7ed17c8ba92afeedd4da395a7a09829ab62011ce`
MD5	`ce5ffd77cae6bab62f12e3c31302c4da`
BLAKE2b-256	`7cc5e0a53abc272cf576aaf0c968c6ba68bc0e1be9da0f949c4384ec04e6693b`

Hashes for sklearn_instrumentation-0.13.0-py3-none-any.whl

Hashes for sklearn_instrumentation-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`994d4aea6356f7124d891b1cf2a38611db5d376958ec2ed57b735a0c0a788ada`
MD5	`9c1cbf917b93e20a8836516bc62922d3`
BLAKE2b-256	`e913845f59b1cb56ca8a62b94a56f9cacb0cff163ab4602f6321b1802e8eb2ea`