anamod

Feature Importance Analysis of Models

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

anamod

Overview

anamod is a python library that implements model-agnostic algorithms for the feature importance analysis of trained black-box models. It is designed to serve the larger goal of interpretable machine learning by using different abstractions over features to interpret models. At a high level, anamod implements the following algorithms:

Given a learned model and a hierarchy over features, (i) it tests feature groups, in addition to base features, and tries to determine the level of resolution at which important features can be determined, (ii) uses hypothesis testing to rigorously assess the effect of each feature on the model’s loss, (iii) employs a hierarchical approach to control the false discovery rate when testing feature groups and individual base features for importance, and (iv) uses hypothesis testing to identify important interactions among features and feature groups. More details may be found in the following paper:
```
Lee, Kyubin, Akshay Sood, and Mark Craven. 2019. “Understanding Learned Models by
Identifying Important Features at the Right Resolution.”
In Proceedings of the AAAI Conference on Artificial Intelligence, 33:4155–63.
https://doi.org/10.1609/aaai.v33i01.33014155.
```
Given a learned temporal or sequence model, it identifies important temporal features and interactions. More details may be found in the following paper:
```
[In preparation]
```

anamod supersedes and contains the functionality of the existing library mihifepe, based on the first paper (https://github.com/Craven-Biostat-Lab/mihifepe). mihifepe is maintained for legacy reasons but will not receive further significant updates.

anamod uses the synmod library to generate synthetic data, including time-series data, to test and validate the algorithms (https://github.com/cloudbopper/synmod).

Usage

See detailed API documentation at https://anamod.readthedocs.io/en/latest/usage.html. Basic usage:

To analyze a scikit-learn binary classification model:

# Train a model
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
model = LogisticRegression()
dataset = datasets.load_breast_cancer()
X, y, feature_names = (dataset.data, dataset.target, dataset.feature_names)
model.fit(X, y)

# Analyze the model
import anamod
model.predict = lambda X: model.predict_proba(X)[:, 1]  # To return a vector of probabilities when model.predict is called
analyzer = anamod.ModelAnalyzer(model, X, y, feature_names=feature_names)
features = analyzer.analyze()

# Show list of important features sorted in decreasing order of importance score, along with importance score and model coefficient
from pprint import pprint
important_features = sorted([feature for feature in features if feature.important], key=lambda feature: feature.effect_size, reverse=True)
pprint([(feature.name, feature.effect_size, model.coef_[0][feature.idx[0]]) for feature in important_features])

To analyze a scikit-learn regression model:

# Train a model
from sklearn.linear_model import Ridge
from sklearn import datasets
model = Ridge(alpha=1e-2)
dataset = datasets.load_diabetes()
X, y, feature_names = (dataset.data, dataset.target, dataset.feature_names)
model.fit(X, y)

# Analyze the model
import anamod
analyzer = anamod.ModelAnalyzer(model, X, y, feature_names=feature_names)
features = analyzer.analyze()

# Show list of important features sorted in decreasing order of importance score, along with importance score and model coefficient
from pprint import pprint
important_features = sorted([feature for feature in features if feature.important], key=lambda feature: feature.effect_size, reverse=True)
pprint([(feature.name, feature.effect_size, model.coef_[feature.idx[0]]) for feature in important_features])

Installation

The recommended installation method is via virtual environments and pip. In addition, you also need graphviz installed on your system.

When making the virtual environment, specify python3 (3.5+) as the python executable:

mkvirtualenv -p python3 anamod

To install the latest stable release:

pip install anamod

Or to install the latest development version from GitHub:

pip install git+https://github.com/cloudbopper/anamod.git@master#egg=anamod

Development

Collaborations and contributions are welcome. If you are interested in helping with development, please take a look at:

https://anamod.readthedocs.io/en/latest/contributing.html

License

anamod is free, open source software, released under the MIT license. See LICENSE for details.

Contact

Akshay Sood

Changelog

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.4

Mar 23, 2022

0.1.3

Nov 4, 2021

0.1.2

Dec 24, 2020

This version

0.1.1

Dec 9, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anamod-0.1.1.tar.gz (81.9 kB view hashes)

Uploaded Dec 9, 2020 Source

Built Distribution

anamod-0.1.1-py2.py3-none-any.whl (107.0 kB view hashes)

Uploaded Dec 9, 2020 Python 2 Python 3

Hashes for anamod-0.1.1.tar.gz

Hashes for anamod-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`1cbe53b476a38f6095de7f7ea4af09dedc686aed3a5168d7d910407140ce29a3`
MD5	`2e8b3d0abf8c2c2561f44b87bb399a95`
BLAKE2b-256	`2f9e12f9ff2a4a50dfee61c94af4afbdfb08333ab8b320858e145ab5809361bc`

Hashes for anamod-0.1.1-py2.py3-none-any.whl

Hashes for anamod-0.1.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a825f8dff9aabfed9f08eb6831ded4973e20eff12d80f4b5dc08885fa253618`
MD5	`acfc95c621c023486bc440dd5016fd44`
BLAKE2b-256	`072261a3302af83b4615d4ef9c786c912d6fb1cce10a182273490f9d0ebd6cf8`