Optimize decision boundary/threshold for predicted probabilities from binary classification

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.7
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

threshold_optimizer

This python library allows you to conveniently evaluate predicted probablilities during a binary classification task by presenting you with the optimum probability thresholds.

Introduction

Classification tasks in machine learning involves models or algorithms learning to assign class lables to elements of a set. Binary Classification is the process of assigning elements to two class labels on the basis of a classification rule. Some of the examples of binary classification includes classifying mails under 'spam' or 'not a spam', medical tests ('cancer detected' or 'cancer not detected') and churn prediction ('churn' or 'not').

Evaluating machine learning models is an important aspect of building models. These evaluations are done using classification metrics, the metrics used depends on the nature of the problem you're solving and the cost of falsely predicted values. Some of these metrics include: confusion matrix, accuracy, precision, recall, F1 score and ROC curve. However these decisions by the metrics are based on a set threshold.

For instance, in order to map a probability representation from logistic regression to a binary category, you must define a classification threshold (also called the decision threshold). In say a cancer patient classification, a value above that threshold indicates "Patient has cancer"; a value below indicates "Patient does not have cancer." It is tempting to assume that the classification threshold should always be 0.5, but thresholds are problem-dependent, and are therefore values that you must tune.

This library allows you to output the optimum threshold value for the metric you're using to evaluate your classification model. The metrics for which you can get the optimum threshold outputs are:

Accuracy

F1 Score

Recall

Specificity

Precision

Requirements

scikit-learn == 0.24.0

pandas == 0.25.1

numpy == 1.17.1

Installation

Usage

Code To Follow

load data and create train, validation and test sets

run model on train data

predict probabilities on validation set

import threshold_optimizer

create threshold_optimizer object

pass predicted probabilities into threshold optimizer object

call threshold_optimizer.optimze_accuracy (or whichever metric) and save returned probability_threshold_value

predict probabilities on test set

use saved threshold to create binary classes

evaluate optimized classes with metric optimized for

Key Terminologies

No need for one yet

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
- Python :: 3.7
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.0.1a2 pre-release

Mar 6, 2021

This version

0.0.1a1 pre-release

Mar 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

threshold_optimizer-0.0.1a1.tar.gz (5.4 kB view hashes)

Uploaded Mar 6, 2021 Source

Built Distribution

threshold_optimizer-0.0.1a1-py3-none-any.whl (6.7 kB view hashes)

Uploaded Mar 6, 2021 Python 3

Hashes for threshold_optimizer-0.0.1a1.tar.gz

Hashes for threshold_optimizer-0.0.1a1.tar.gz
Algorithm	Hash digest
SHA256	`40bb14859bcb6ed36c422295add7c24dfc88d3ba135b5beea6c44995c7a3c6e9`
MD5	`2526912054d524393a23f17e21021e0d`
BLAKE2b-256	`c7265c3f6507887dedc33060658a3efe9d710b0a8acbf9394e1c913be5bbf202`

Hashes for threshold_optimizer-0.0.1a1-py3-none-any.whl

Hashes for threshold_optimizer-0.0.1a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9d88795527b2b63cbf99f6d3db45b2c8452dc4ba072805d5c1425f5473650af`
MD5	`55959a8d279c4ebc223ead1da44c2557`
BLAKE2b-256	`72ecc597c2e224d00d433ea9eb4eaafe160ed09db9bdee0c6c159d46236c262b`