calfcv

Coarse approximation linear function with cross validation

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Project description

A binomial classifier that implements the Coarse Approximation Linear Function (CALF).

Contact

Rolf Carlson hrolfrc@gmail.com

Install

Use pip to install calfcv.

pip install calfcv

Introduction

This is a python implementation of the Coarse Approximation Linear Function (CALF). The implementation is based on the greedy forward selection algorithm described in the paper referenced below.

Currently, CalfCV provides classification and prediction for two classes, the binomial case. Multinomial classification with more than two cases is not implemented.

The feature matrix is scaled to have zero mean and unit variance. Cross-validation is implemented to identify optimal score and coefficients. CalfCV is designed for use with scikit-learn pipelines and composite estimators.

Example

from calfcv import CalfCV
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
import numpy as np

Make a classification problem

seed = 42
X, y = make_classification(
    n_samples=30,
    n_features=5,
    n_informative=2,
    n_redundant=2,
    n_classes=2,
    random_state=seed
)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=seed)

Train the classifier

The best score is the best average auc.

cls = CalfCV().fit(X_train, y_train)
cls.best_score_

0.95

The coefficients for the best score are in [-1, 0, 1].

cls.best_coef_

[-1, 1, 0, 1, 1]

The probabilities of class 1 are in the last row

We vertically stack the ground truth on the top with the probabilities of class 1 on the bottom. We show the first 5 entries.

np.round(np.vstack((y_train, cls.predict_proba(X_train).T))[:, 0:5], 2)

array([[0.  , 1.  , 1.  , 0.  , 0.  ],
       [0.71, 0.05, 0.19, 0.34, 0.54],
       [0.29, 0.95, 0.81, 0.66, 0.46]])

Predicting the training data should give a slightly higher score than the best_score_

That is what we see here. The reason is that best_score_ is a mean of auc over the cross validation.

roc_auc_score(y_true=y_train, y_score=cls.predict_proba(X_train)[:, 1])

0.9750000000000001

The classifier will likely produce a lower score on unseen data

Often we get a lower score on the unseen data, but in this case we get a higher score.

roc_auc_score(y_true=y_test, y_score=cls.predict_proba(X_test)[:, 1])

1.0

Score using classes is lower than score using probabilities

The ground truth is on the top and the predicted class is on the bottom. Sample 6 of y_test is predicted incorrectly but the others are correct.

y_pred = cls.predict(X_test)
np.vstack((y_test, y_pred))

array([[0, 1, 1, 0, 1, 0, 0, 0],
       [0, 1, 1, 0, 1, 0, 1, 0]])

roc_auc_score(y_true=y_test, y_score=y_pred)

0.9

Authors

The CALF algorithm was designed by Clark D. Jeffries, John R. Ford, Jeffrey L. Tilson, Diana O. Perkins, Darius M. Bost, Dayne L. Filer and Kirk C. Wilhelmsen. This python implementation was written by Rolf Carlson.

References

Jeffries, C.D., Ford, J.R., Tilson, J.L. et al. A greedy regression algorithm with coarse weights offers novel advantages. Sci Rep 12, 5440 (2022). https://doi.org/10.1038/s41598-022-09415-2

Project details

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.3.17

Aug 27, 2023

0.3.16

Aug 26, 2023

0.3.15

Aug 24, 2023

0.3.13

Aug 23, 2023

0.3.12

Aug 23, 2023

0.3.11

Aug 23, 2023

0.3.10

Aug 22, 2023

0.3.8

Aug 20, 2023

0.3.5

Aug 20, 2023

0.3.4

Aug 20, 2023

0.3.1

Aug 19, 2023

0.3.0

Aug 19, 2023

0.2.9

Aug 18, 2023

0.2.8

Aug 18, 2023

0.2.7

Aug 18, 2023

0.1.41

Aug 17, 2023

0.1.39

Aug 16, 2023

0.1.28

Aug 16, 2023

0.1.27

Aug 15, 2023

0.1.26

Aug 14, 2023

0.1.25

Aug 14, 2023

0.1.24

Aug 14, 2023

0.1.23

Aug 13, 2023

0.1.22

Aug 13, 2023

0.1.21

Aug 12, 2023

0.1.20

Aug 12, 2023

0.1.19

Aug 7, 2023

0.1.17

Aug 7, 2023

0.1.16

Aug 7, 2023

0.1.15

Aug 7, 2023

0.1.14

Aug 7, 2023

0.1.13

Aug 7, 2023

0.1.12

Aug 7, 2023

0.1.11

Aug 7, 2023

0.1.8

Aug 7, 2023

0.1.3

Aug 7, 2023

0.1.1

Aug 7, 2023

0.0.22

Aug 7, 2023

0.0.20

Aug 6, 2023

0.0.15

Aug 5, 2023

0.0.14

Aug 5, 2023

0.0.12

Aug 5, 2023

This version

0.0.8

Jul 30, 2023

0.0.7

Jul 29, 2023

0.0.6

Jul 29, 2023

0.0.5

Jul 29, 2023

0.0.4

Jul 29, 2023

0.0.3

Jul 29, 2023

0.0.2

Jul 29, 2023

0.0.1

Jul 29, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

calfcv-0.0.8.tar.gz (9.0 kB view hashes)

Uploaded Jul 30, 2023 Source

Built Distribution

calfcv-0.0.8-py3-none-any.whl (8.0 kB view hashes)

Uploaded Jul 30, 2023 Python 3

Hashes for calfcv-0.0.8.tar.gz

Hashes for calfcv-0.0.8.tar.gz
Algorithm	Hash digest
SHA256	`eeb09f259dcdbc8ee18a2ce6f94ce29a9d366b7842c3b7de1118131b783c2efd`
MD5	`81deccff5090f2107769bf4441989218`
BLAKE2b-256	`3e23711cb250e4bf95a971ed99c65b906bf98cdda9d7c02183d02f00d70657e6`

Hashes for calfcv-0.0.8-py3-none-any.whl

Hashes for calfcv-0.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`522401ac555a184c526f3f95238caf6fdf1eb5df0402117c11c8cb955dba63fc`
MD5	`caa50443246bb015e150b8bf2c23cb1b`
BLAKE2b-256	`ec3db2ad60c58a5eddd9fe66085f44b9a4eba18f4b42b5f2686ca8ddf61159b6`