Empirical Information Bottleneck

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Project description

EMBO - Empirical Bottleneck

A Python package for working with the Information Bottleneck [Tishby, Pereira, Bialek 2001] and the Deterministic (and Generalized) Information Bottleneck [Strouse and Schwab 2016]. Embo is especially geared towards the analysis of concrete, finite-size data sets.

Requirements

embo requires Python 3, numpy>=1.7 and scipy.

Installation

To install the latest release, run:

pip install embo

(depending on your system, you may need to use pip3 instead of pip in the command above).

Testing

(requires setuptools). If embo is already installed on your system, look for the copy of the test_embo.py script installed alongside the rest of the embo files and execute it. For example:

python /usr/lib/python3.X/site-packages/embo/test/test_embo.py

Usage

The Information Bottleneck

We refer to [Tishby, Pereira, Bialek 2001] and [Strouse and Schwab 2016] for a general introduction to the Information Bottleneck. Briefly, if X and Y are two random variables, we are interested in finding another random variable M (called the "bottleneck" variable) that solves the following optimisation problem:

min_{p(m|x)} = H(M) - α H(M|X) - β I(M:Y)

for any β>0 and 0≤α≤1, and where M is constrained to be independent on Y conditional on X:

p(x,m,y) = p(x)p(m|x)p(y|x)

Intuitively, we want to find the stochastic mapping p(M|X) that extracts from X as much information about Y as possible while forgetting all irrelevant information. β is a free parameter that sets the relative importance of forgetting irrelevant information versus remembering useful information. α determines what notion of "forgetting" we use: α=1 ("vanilla" bottleneck or IB) implies that we want to minimise the mutual information I(M:X), α=0 (deterministic bottleneck or DIB) that we want to make M a good compression of X by minimising its entropy H(M), and intermediate values interpolate between these two conditions.

Typically, one is interested in the curve described by I(M:Y) as a function of I(M:X) or H(M) at the solution of the bottleneck problem for a range of values of β. This curve gives the optimal tradeoff of compression and prediction, telling us what is the minimum amount of information one needs to know about X (or minimum amount of entropy one needs to retain) to be able to predict Y to a certain accuracy, or vice versa, what is the maximum accuracy one can have in predicting Y given a certain amount of information about X.

Using `embo`

Embo can solve the information bottleneck problem for discrete random variables starting from a set of joint empirical observations. The main point of entry to the package is the InformationBottleneck class. In its constructor, InformationBottleneck takes as arguments an array of observations for X and an (equally long) array of observations for Y, together with other optional parameters (see the docstring for details). In the most basic use case, users can call the get_bottleneck method of an InformationBottleneck object, which will assume α=1 and return a set of β values and the optimal values of I(M:X), I(M:Y) and H(M) corresponding to those β. The optimal tradeoff can then be visualised by plotting I(M:Y) vs I(M:Y).

For instance:

import numpy as np
from matplotlib import pyplot as plt
from embo import InformationBottleneck

# data sequences
x = np.array([0,0,0,1,0,1,0,1,0,1])
y = np.array([0,1,0,1,0,1,0,1,0,1])

# compute the IB bound from the data (vanilla IB; Tishby et al 2001)
I_x,I_y,H_m,β = InformationBottleneck(x,y).get_bottleneck()

# plot the IB bound
plt.plot(I_x,I_y)

Embo can also operate starting from a joint (X,Y) probability distribution, encoded as a 2D array containing the probability of each combination of states for X and Y.

# define joint probability mass function for a 2x2 joint pmf
pxy = np.array([[0.1, 0.4],[0.35, 0.15]]),

# compute IB
I_x,I_y,H_m,β = InformationBottleneck(pxy=pxy).get_bottleneck()

# plot I(M:Y) vs I(M:X)
plt.plot(I_x,I_y)

The deterministic and generalised bottleneck can be computed by setting appropriately the parameter alpha:

# compute Deterministic Information Bottleneck (Strouse 2016)
I_x,I_y,H_m,β = InformationBottleneck(pxy=pxy, alpha=0).get_bottleneck()

# plot I(M:Y) vs H(M)
plt.plot(H_m,I_y)

More examples

The embo/examples directory contains some Jupyter notebook that should exemplify most of the package's functionality.

Basic-Example.ipynb: basics; how to compute and plot an IB bound.
Markov-Chains.ipynb: using embo for past-future bottleneck type analyses on data from Markov chains.
Deterministic-Bottleneck.ipynb: Deterministic and Generalized Information Bottleneck. Here we reproduce a key figure from the Deterministic Bottleneck paper, and we explore the algorithm's behaviour as α changes from 0 to 1.
Compare-embo-dit.ipynb: here we compare embo with dit [James et al 2018]. We compare the solutions found by the two packages on a set of simple IB problems (including a problem taken from dit's documentation), and we show that embo is orders of magnitude faster than dit.

Further details

For more details, please consult the docstrings in InformationBottleneck.

Changelog

See the CHANGELOG.md file for a list of changes from older versions.

Authors

embo is maintained by Eugenio Piasini, Alexandre Filipowicz and Jonathan Levine.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

This version

1.1.0

Feb 22, 2021

1.0.6

Feb 6, 2021

1.0.5

Feb 6, 2021

1.0.4

Feb 5, 2021

1.0.3

Feb 5, 2021

1.0.2

May 6, 2020

1.0.1

Feb 4, 2020

1.0.0

Feb 3, 2020

0.4.0

Jan 23, 2020

0.3.3

Jan 23, 2020

0.3.1

Jan 21, 2020

0.3.1.dev0 pre-release

Jan 21, 2020

0.3.0

Dec 20, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embo-1.1.0.tar.gz (482.0 kB view hashes)

Uploaded Feb 22, 2021 Source

Built Distribution

embo-1.1.0-py3-none-any.whl (478.4 kB view hashes)

Uploaded Feb 22, 2021 Python 3

Hashes for embo-1.1.0.tar.gz

Hashes for embo-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4dd586b995fa3665f372aea8c13beae4952215cd08c3fbc4d1f7763040c9cb65`
MD5	`021e6416d9f478e85939640ce6676abf`
BLAKE2b-256	`e17df93e3f81fe5677594e35eefc4e91dfbe93016e026f92b63b3260d4a18743`

Hashes for embo-1.1.0-py3-none-any.whl

Hashes for embo-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d85c3f6589d1deb81ce20e4c2ef790cb70cabde74086917d83bba4ac8ddaa80a`
MD5	`2f60b8bb1be20791114cb121b6fa58a3`
BLAKE2b-256	`f83ad07e0550cc9f59c4f18a5a797e2b01e64099b74d6ebb33677a383cbb6d92`

embo 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

EMBO - Empirical Bottleneck

Requirements

Installation

Testing

Usage

The Information Bottleneck

Using `embo`

More examples

Further details

Changelog

Authors

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

embo 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

EMBO - Empirical Bottleneck

Requirements

Installation

Testing

Usage

The Information Bottleneck

Using embo

More examples

Further details

Changelog

Authors

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Using `embo`