Skip to main content

Find the best probability distribution for your dataset

Project description

phitter-dark-logo

Phitter analyzes datasets and determines the best analytical probability distributions that represent them. The Phitter kernel studies over 80 probability distributions, both continuous and discrete, 3 goodness-of-fit tests, and interactive visualizations. For each selected probability distribution, a standard modeling guide is provided along with spreadsheets that detail the methodology for using the chosen distribution in data science, operations research, and artificial intelligence.

In this repository is the implementation in python and the kernel for Phitter Web

Installation

Requirements

python: >=3.9

PyPI

pip install phitter

Usage

General

import phitter

data: list[int | float] = [...]

phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

Full continuous implementation

import phitter

data: list[int | float] = [...]

phitter_cont = phitter.PHITTER(
    data=data,
    fit_type="continuous",
    num_bins=15,
    confidence_level=0.95,
    minimum_sse=1e-2,
    distributions_to_fit=["beta", "normal", "fatigue_life", "triangular"],
)
phitter_cont.fit(n_jobs=6)

Full discrete implementation

import phitter

data: list[int | float] = [...]

phitter_disc = phitter.PHITTER(
    data=data,
    fit_type="discrete",
    confidence_level=0.95,
    minimum_sse=1e-2,
    distributions_to_fit=["binomial", "geometric"],
)
phitter_disc.fit(n_jobs=2)

Phitter: properties and methods

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.best_distribution -> dict
phitter_cont.sorted_distributions_sse -> dict
phitter_cont.not_rejected_distributions -> dict
phitter_cont.df_sorted_distributions_sse -> pandas.DataFrame
phitter_cont.df_not_rejected_distributions -> pandas.DataFrame

Histogram Plot

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.plot_histogram()
phitter_histogram

Histogram PDF Dsitributions Plot

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.plot_histogram_distributions()
phitter_histogram

Histogram PDF Dsitribution Plot

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.plot_distribution("beta")
phitter_histogram

ECDF Plot

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.phitter.plot_ecdf()
phitter_histogram

ECDF Distribution Plot

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.phitter.plot_ecdf_distribution("beta")
phitter_histogram

QQ Plot

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.phitter.qq_plot("beta")
phitter_histogram

QQ - Regression Plot

import phitter
data: list[int | float] = [...]
phitter_cont = phitter.PHITTER(data)
phitter_cont.fit()

phitter_cont.phitter.qq_plot_regression("beta")
phitter_histogram

Distributions: Methods and properties

import phitter

distribution = phitter.continuous.BETA({"alpha": 5, "beta": 3, "A": 200, "B": 1000})

## CDF, PDF, PPF, PMF receive float or numpy.ndarray. For discrete distributions PMF instead of PDF. Parameters notation are in description of ditribution
distribution.cdf(752) # -> 0.6242831129533498
distribution.pdf(388) # -> 0.0002342575686629883
distribution.ppf(0.623) # -> 751.5512889417921
distribution.sample(2) # -> [550.800114   514.85410326]
distribution.sample(2) # -> [622.94263263 827.21838464]

## STATS
distribution.mean # -> 700.0
distribution.variance # -> 16666.666666666668
distribution.standard_deviation # -> 129.09944487358058
distribution.skewness # -> -0.3098386676965934
distribution.kurtosis # -> 2.5854545454545454
distribution.median # -> 708.707130841534
distribution.mode # -> 733.3333333333333

Continuous Distributions

• ALPHA • ARCSINE • ARGUS • BETA • BETA PRIME • BETA PRIME 4P • BRADFORD • BURR • BURR 4P • CAUCHY • CHI SQUARE • CHI SQUARE 3P • DAGUM • DAGUM 4P • ERLANG • ERLANG 3P • ERROR FUNCTION • EXPONENTIAL • EXPONENTIAL 2P • F • FATIGUE LIFE • FOLDED NORMAL • FRECHET • F 4P • GAMMA • GAMMA 3P • GENERALIZED EXTREME VALUE • GENERALIZED GAMMA • GENERALIZED GAMMA 4P • GENERALIZED LOGISTIC • GENERALIZED NORMAL • GENERALIZED PARETO • GIBRAT • GUMBEL LEFT • GUMBEL RIGHT • HALF NORMAL • HYPERBOLIC SECANT • INVERSE GAMMA • INVERSE GAMMA 3P • INVERSE GAUSSIAN • INVERSE GAUSSIAN 3P • JOHNSON SB • JOHNSON SU • KUMARASWAMY • LAPLACE • LEVY • LOGGAMMA • LOGISTIC • LOGLOGISTIC • LOGLOGISTIC 3P • LOGNORMAL • MAXWELL • MOYAL • NAKAGAMI • NON CENTRAL CHI SQUARE • NON CENTRAL F • NON CENTRAL T STUDENT • NORMAL • PARETO FIRST KIND • PARETO SECOND KIND • PERT • POWER FUNCTION • RAYLEIGH • RECIPROCAL • RICE • SEMICIRCULAR • TRAPEZOIDAL • TRIANGULAR • T STUDENT • T STUDENT 3P • UNIFORM • WEIBULL • WEIBULL 3P

Discrete Distributions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phitter-0.0.1.tar.gz (70.1 kB view hashes)

Uploaded Source

Built Distribution

phitter-0.0.1-py3-none-any.whl (219.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page