ibex · PyPI

Pandas Adapters For Scikit-Learn

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Scientific/Engineering :: Information Analysis

Project description

Ami Tavory, Shahar Azulay, Tali Raveh-Sadka

https://travis-ci.org/atavory/ibex.svg?branch=master

https://landscape.io/github/atavory/ibex/master/landscape.svg?style=flat

https://img.shields.io/codecov/c/github/atavory/ibex/master.svg

http://readthedocs.org/projects/ibex/badge/?version=latest

https://img.shields.io/badge/license-BSD--3--Clause-brightgreen.svg

This library aims for two (somewhat independent) goals:

providing pandas adapters for estimators conforming to the sickit-learn protocol, in particular those of scikit-learn itself
providing easier, and more succinct ways of combining estimators, features, and pipelines

(You might also want to check out the excellent pandas-sklearn which has the same aims, but takes a very different approach.)

The full documentation at defines these matters in detail, but the library has an extremely-small interface.

TL;DR

The following short example shows the main points of the library. It is an adaptation of the scikit-learn example Concatenating multiple feature extraction methods. In this example, we build a classifier for the iris dataset using a combination of PCA, univariate feature selection, and a support vecor machine classifier.

We first load the Iris dataset into a pandas DataFrame.

>>> import numpy as np
>>> from sklearn import datasets
>>> import pandas as pd
>>>
>>> iris = datasets.load_iris()
>>> features, iris = iris['feature_names'], pd.DataFrame(
...     np.c_[iris['data'], iris['target']],
...     columns=iris['feature_names']+['class'])
>>>
>>> iris.columns
Index([...'sepal length (cm)', ...'sepal width (cm)', ...'petal length (cm)',
       ...'petal width (cm)', ...'class'],
      dtype='object')

Now, we import the relevant steps. Note that, in this example, we import them from ibex.sklearn rather than sklearn.

>>> from ibex.sklearn.svm import SVC as PDSVC
>>> from ibex.sklearn.feature_selection import SelectKBest as PDSelectKBest
>>> from ibex.sklearn.decomposition import PCA as PDPCA

(Of course, it’s possible to import steps from sklearn as well, and use them alongside and together with the steps of ibex.sklearn.)

Finally, we construct a pipeline that, given a DataFrame of features:

horizontally concatenates a 2-component PCA DataFrame, and the best-feature DataFrame, to a resulting DataFrame
then, passes the result to a support-vector machine classifier outputting a pandas series:
```
>>> clf = PDPCA(n_components=2) + PDSelectKBest(k=1) | PDSVC(kernel="linear")
```

clf is now a pandas-ware classifier, but otherwise can be used pretty much like all sklearn estimator. For example,

>>> param_grid = dict(
...     featureunion__pca__n_components=[1, 2, 3],
...     featureunion__selectkbest__k=[1, 2],
...     svc__C=[0.1, 1, 10])
>>> from ibex.sklearn.model_selection import GridSearchCV as PDGridSearchCV
>>> PDGridSearchCV(clf, param_grid=param_grid).fit(iris[features], iris['class']) # doctest: +SKIP
...

So what does this add to the original version?

The estimators perform verification and processing on the inputs and outputs. They verify column names following calls to fit, and index results according to those of the inputs. This helps catch bugs.
It allows writing Pandas-munging estimators (see also Multiple-Row Features In The Movielens Dataset).
Using DataFrame metadata, it allows writing more complex meta-learning algorithms, such as stacking and nested labeled and stratified cross validation.
The pipeline syntax is succinct and clear (see Motivation For Shorter Combinations).

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

0.1.3

Apr 21, 2018

0.1.2

Nov 5, 2017

This version

0.1.1.4

Aug 11, 2017

0.1.1.3

Aug 10, 2017

0.1.1

Aug 10, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ibex-0.1.1.4.tar.gz (15.4 kB view hashes)

Uploaded Aug 11, 2017 Source

Built Distribution

ibex-0.1.1.4-py3.5.egg (48.5 kB view hashes)

Uploaded Aug 11, 2017 Source

Hashes for ibex-0.1.1.4.tar.gz

Hashes for ibex-0.1.1.4.tar.gz
Algorithm	Hash digest
SHA256	`cd899185a45d8c4546e56add8457bb26bd23cce2c5cb5fc9b704bb5802aae92c`
MD5	`771133891ed0c0a93d73a51e6483c978`
BLAKE2b-256	`8fa35c87dd9edb2ec8dc7edad6b23c60d8fb02e91dc431a7623e1038837000bb`

Hashes for ibex-0.1.1.4-py3.5.egg

Hashes for ibex-0.1.1.4-py3.5.egg
Algorithm	Hash digest
SHA256	`e703d40a2c85e749e29e9f7c16cc2959aa93ca5d9286db953b3417008d5ef65b`
MD5	`9ec562149a95b467aa7abf9eef5f9fde`
BLAKE2b-256	`70e41669fb7489cac70370f46d4290c7e9c7f22aa5f04b585c8de366803ec0f5`