Skip to main content

A wrapper for easy plots of learning and validation curves

Project description

sk-modelcurves

A Python wrapper built for software engineers and researchers to facilitate easy creation of learning and validation curve plots from scikit-learn.

The module is meant to complement your workflow in scikit-learn and ease the process of evaluating your models.

The module includes many quality of life features that should save you precious time whenever you want to plot a learning curve to check for bias/variance or plot a validation curve to see the effect of tuning a hyperparameter.

Background

For those not familiar with learning curves, check out Andrew Ng’s excellent discussion of their use at http://cs229.stanford.edu/materials/ML-advice.pdf

Over the process of writing many research papers and building many models, I found myself using boilerplate code that I would copy paste for almost every project whenever I wanted to plot a learning curve or validation curve to evaluate models.

Hopefully, this module will save you a few minutes each time you need to plot a learning or validation curve so you can focus on other things.

Install

Python’s pip is the recommended method of installation. From the terminal:

$ pip install sk_modelcurves

Example Usage

Generate a learning curve using accuracy as a metric and 5-fold cross validation.

Assumes a sklearn estimator called knn, training data matrix called X and training labels called y:

$ from sk_modelcurves.learning_curve import draw_learning_curve
$ draw_learning_curve(knn, X, y, scoring='accuracy', cv=5)
$ plt.show()

Generate multiple learning curves for several estimators using F1 score as a metric, 5-fold cross validation, and names for each of the estimators.

Assumes 3 sklearn estimators called knn2, knn20, knn40, training data matrix called X and training labels called y:

$ from sk_modelcurves.learning_curve import draw_learning_curve
$ draw_learning_curve([knn2, knn20, knn40], X, y, scoring='f1', cv=5,
  estimator_titles=['2 Neighbors', '20 Neighbors', '40 Neighbors'])
$ plt.show()

Many other options are available. Check out the source code docstrings or the upcoming documentation.

Dependencies

sk-modelcurves is tested to work for Python 2.6 and Python 2.7. Python 3.3+ has not been tested and is assumed to not work until tested.

The required dependencies include scikit-learn (of course!), numpy >= 1.6.1, and matplotlib >= 1.1.1.

To run tests, you will need nose >= 1.1.2.

Contributing

Anyone is welcome!

If you find a bug or would like to discuss a potential feature, please file an issue first.

Testing

After installation, you can launch the test suite from outside the source directory (you will need to have the nose package installed):

$ nosetests -v sk_modelcurves

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sk_modelcurves-0.4.tar.gz (5.6 kB view hashes)

Uploaded Source

Built Distribution

sk_modelcurves-0.4-py2-none-any.whl (7.9 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page