Python package to benchmark GLM implementations.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

glum

Generalized linear models (GLM) are a core statistical tool that include many common methods like least-squares regression, Poisson regression and logistic regression as special cases. At QuantCo, we have used GLMs in e-commerce pricing, insurance claims prediction and more. We have developed glum, a fast Python-first GLM library. The development was based on a fork of scikit-learn, so it has a scikit-learn-like API. We are thankful for the starting point provided by Christian Lorentzen in that PR!

glum is at least as feature-complete as existing GLM libraries like glmnet or h2o. It supports

Built-in cross validation for optimal regularization, efficiently exploiting a “regularization path”
L1 regularization, which produces sparse and easily interpretable solutions
L2 regularization, including variable matrix-valued (Tikhonov) penalties, which are useful in modeling correlated effects
Elastic net regularization
Normal, Poisson, logistic, gamma, and Tweedie distributions, plus varied and customizable link functions
Box constraints, linear inequality constraints, sample weights, offsets

This repo also includes tools for benchmarking GLM implementations in the glum_benchmarks module. For details on the benchmarking, see here. Although the performance of glum relative to glmnet and h2o depends on the specific problem, we find that it is consistently much faster for a wide range of problems.

For more information on glum, including tutorials and API reference, please see the documentation.

An example: predicting car insurance claim frequency using Poisson regression.

This example uses a public French car insurance dataset.

>>> import pandas as pd
>>> import numpy as np
>>> from glum_benchmarks.problems import load_data, generate_narrow_insurance_dataset
>>> from glum_benchmarks.util import get_obj_val
>>> from glum import GeneralizedLinearRegressor
>>>
>>> # Load the French Motor Insurance dataset
>>> dat = load_data(generate_narrow_insurance_dataset)
>>> X, y, sample_weight = dat['X'], dat['y'], dat['sample_weight']
>>>
>>> # Model the number of claims per year as Poisson and regularize using a L1-penalty.
>>> model = GeneralizedLinearRegressor(
...     family='poisson',
...     l1_ratio=1.0,
...     alpha=0.001
... )
>>>
>>> _ = model.fit(X=X, y=y, sample_weight=sample_weight)
>>>
>>> # .report_diagnostics shows details about the steps taken by the iterative solver
>>> diags = model.get_formatted_diagnostics(full_report=True)
>>> diags[['objective_fct']]
        objective_fct
n_iter               
0            0.331670
1            0.328841
2            0.319605
3            0.318660
4            0.318641
5            0.318641

Installation

Please install the package through conda-forge:

conda install glum -c conda-forge

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.0.0

Apr 27, 2024

3.0.0a2 pre-release

Dec 12, 2023

3.0.0a1 pre-release

Aug 29, 2023

3.0.0a0 pre-release

Aug 17, 2023

2.7.0

Feb 19, 2024

2.6.0

Sep 5, 2023

2.5.2

Jun 2, 2023

2.5.1

May 19, 2023

2.5.0

Apr 28, 2023

2.4.1

Mar 15, 2023

2.4.0

Jan 31, 2023

2.3.0

Jan 6, 2023

2.2.1

Nov 25, 2022

2.1.2

Jul 1, 2022

2.1.1

Jul 1, 2022

2.1.0

Jun 27, 2022

2.0.3

Nov 5, 2021

2.0.2

Nov 3, 2021

This version

2.0.0

Oct 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glum-2.0.0.tar.gz (13.0 MB view hashes)

Uploaded Oct 8, 2021 Source

Hashes for glum-2.0.0.tar.gz

Hashes for glum-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`bdc1d56af5d2407d42926f85367c07b0781bde1d9ca6d0ecfe8378d6f7f67b8f`
MD5	`95de6cd4e4d4e611e5805a16b3ac8da6`
BLAKE2b-256	`8f7e4e33ada93a4274084880b4448c8d52c3d05a9f5f4eeed3ce93d33c471f8e`