Skip to main content

an ML library for model development and governance

Project description

Rubicon

PyPi Version Test Package Publish Package Publish Docs

rubicon is a data science tool for capturing all information related to a model during its development. With minimal effort, Rubicon's Python library can integrate directly into your Python models:

from rubicon import Rubicon

# Configure client object, automatically track git details
rubicon = Rubicon(
    persistence="filesystem", root_dir="/rubicon-root", auto_git_enabled=True
)

# Create a project to hold a collection of experiments
project = rubicon.create_project(
    "Hello World", description="Using rubicon to track model results over time."
)

# Log experiment data
experiment = project.log_experiment(
    training_metadata=[SklearnTrainingMetadata("sklearn.datasets", "my-data-set")],
    model_name="My Model Name",
    tags=["model"],
)

experiment.log_parameter("n_estimators", n_estimators)
experiment.log_parameter("n_features", n_features)
experiment.log_parameter("random_state", random_state)

accuracy = rfc.score(X_test, y_test)
experiment.log_metric("accuracy", accuracy)

# Tag the data so it's easily filterable
if accuracy >= .94:
    experiment.add_tags(["success"])

Explore and visualize Rubicon projects stored locally or in S3 with the CLI:

rubicon ui --root-dir /rubicon-root

Purpose

Rubicon is a data science tool for capturing all information related to a model during its development. It allows data scientists to store model results over time and ensures full audibility and reproducibility.

It offers the following features:

  • a Python library for storing and retrieving model inputs, ouputs, and analyses to filesystems (local, S3)

  • a dashboard for exploring, comparing, and visualizing logged data

  • a process for sharing a selected subset of logged data with collaborators

Rubicon is designed to enforce best practices, like automatically linking logged experiments (results) to their corresponding model code. And it supports concurrent logging, so multiple experiments can be logged in parallel and also asynchronous communication with S3, so network reads and writes don’t block.

Documentation

For a full overview, visit the docs. If you have suggestions or find a bug, please open an issue.

Install

pip install rubicon-ml

Develop

rubicon uses conda to manage environments. First, install conda. Then use conda to setup a development environment:

conda env create -f ci/environment.yml
conda activate rubicon-dev

Testing

The tests are separated into unit and integration tests. They can be run directly in the activated dev environment via pytest tests/unit or pytest tests/integration. Or by simply running pytest to execute all of them.

Note: some integration tests are intentionally marked to control when they are run (i.e. not during cicd). These tests include:

  • Integration tests that connect to physical filesystems (local, S3). You'll want to configure the root_dir appropriately for these tests (tests/integration/test_async_rubicon.py, tests/integration/test_rubicon.py). And they can be run with:

    pytest -m "physical_filesystem_test"
    
  • Integration tests for the dashboard. To run these integration tests locally, you'll need to install one of the WebDrivers. To do so, follow the Install instructions in the Dash Testing Docs or install via brew with brew cask install chromedriver. You may have to update your permissions in Security & Privacy to install with brew.

    pytest -m "dashboard_test"
    

    Note: The --headless flag can be added to run the dashboard tests in headless mode.

Code Formatting

Install and configure pre-commit to automatically run black, flake8, and isort during commits:

Now pre-commit will run automatically on git commit and will ensure consistent code format throughout the project. You can format without committing via pre-commit run or skip these checks with git commit --no-verify.

Contributors


Mike McCarty


Sri Ranganathan


Joe Wolfe


Ryan Soley


Diane Lee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rubicon-ml-0.1.2.tar.gz (1.5 MB view hashes)

Uploaded Source

Built Distribution

rubicon_ml-0.1.2-py3-none-any.whl (1.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page