an ML library for model development and governance
Project description
Rubicon
rubicon
is a data science tool for capturing all information related to a model
during its development. With minimal effort, Rubicon's Python library can integrate
directly into your Python models:
from rubicon import Rubicon
# Configure client object, automatically track git details
rubicon = Rubicon(
persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)
# Create a project to hold a collection of experiments
project = rubicon.create_project(
"Hello World", description="Using rubicon to track model results over time."
)
# Log experiment data
experiment = project.log_experiment(
training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
model_name="My Model Name",
tags=["model"],
)
experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)
accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)
# Tag the data so it's easily filterable
if accuracy >= .94:
experiment.add_tags(["success"])
Explore and visualize Rubicon projects stored locally or in S3 with the CLI:
rubicon ui --root-dir /rubicon-root
Purpose
Rubicon is a data science tool for capturing all information related to a model during its development. It allows data scientists to store model results over time and ensures full audibility and reproducibility.
It offers the following features:
-
a Python library for storing and retrieving model inputs, ouputs, and analyses to filesystems (local, S3)
-
a dashboard for exploring, comparing, and visualizing logged data
-
a process for sharing a selected subset of logged data with collaborators
Rubicon is designed to enforce best practices, like automatically linking logged experiments (results) to their corresponding model code. And it supports concurrent logging, so multiple experiments can be logged in parallel and also asynchronous communication with S3, so network reads and writes don’t block.
Documentation
For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.
Install
pip install rubicon-ml
Develop
rubicon
uses conda to manage environments. First, install
conda.
Then use conda to setup a development environment:
conda env create -f ci/environment.yml
conda activate rubicon-dev
Testing
The tests are separated into unit and integration tests. They can be run
directly in the activated dev environment via pytest tests/unit
or pytest tests/integration
. Or by simply running pytest
to execute all of them.
Note: some integration tests are intentionally marked
to control when they
are run (i.e. not during cicd). These tests include:
-
Integration tests that connect to physical filesystems (local, S3). You'll want to configure the
root_dir
appropriately for these tests (tests/integration/test_async_rubicon.py, tests/integration/test_rubicon.py). And they can be run with:pytest -m "physical_filesystem_test"
-
Integration tests for the dashboard. To run these integration tests locally, you'll need to install one of the WebDrivers. To do so, follow the
Install
instructions in the Dash Testing Docs or install via brew withbrew cask install chromedriver
. You may have to update your permissions in Security & Privacy to install with brew.pytest -m "dashboard_test"
Note: The
--headless
flag can be added to run the dashboard tests in headless mode.
Code Formatting
Install and configure pre-commit to automatically run black
, flake8
, and
isort
during commits:
- install pre-commit
- run
pre-commit install
to set up the git hook scripts
Now pre-commit
will run automatically on git commit and will ensure consistent
code format throughout the project. You can format without committing via
pre-commit run
or skip these checks with git commit --no-verify
.
Contributors
Mike McCarty |
Sri Ranganathan |
Joe Wolfe |
Ryan Soley |
Diane Lee |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for rubicon_ml-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 238e60f8a0865dff41f6cb6afc6ad438c7a789bd0a00e52c5bd4c67c5820df1a |
|
MD5 | ef9ca6036327e3b536bca2d886f2bd9a |
|
BLAKE2b-256 | 6b1679388355ed8a15f8d54ba3dc0ac3513ce1ec2041b19aa40605987278e367 |