Skip to main content

Data visualization for multivariate datasets with a nonlinear dependence structure

Project description

Copulogram

This package is provides a new data visualisation tool to explore multivariate datasets developped with V.Chabridon.

A copulogram is an innovative plot as it decomposes a mutivariate dataset between the effects of the marginals and those of the dependence between features. To do so, it represents the marginals with univariate kernel density estimation plots or histograms (diagonal), and the dependence structure with scatter plots in the ranked space (upper triangle). On the bottom triangle the scatter plots are set in the physical space, gathering the effects of the marginals and the dependencies. Since the dependence structure is theoretically modeled by an underlying copula, this plot is called copulogram, generalizing the well-known ``correlogram'' to nonlinear dependencies. It gives a synthetic and empirical decomposition of the dataset.

Copulogram of a wind-wave dataset

Copulogram of wind-waves dataset

Installation

The following commands install the current version of the copulogram package.

~$ pip install copulogram

Example on iris dataset

Using the famous iris dataset, let us plot copulograms with different settings:

>>> import seaborn as sns
>>> import copulogram as cp

>>> data = sns.load_dataset('iris')
>>> copulogram = cp.Copulogram(data)
>>> copulogram.draw()
Copulogram of iris dataset
>>> copulogram.draw(alpha=0.8, hue='species', kde_on_marginals=False)
Copulogram of iris dataset
>>> copulogram.draw(hue='species', quantile_contour_levels=[0.2, 0.4, 0.6, 0.8])
Copulogram of iris dataset

References

  • Empirical Bernstein copula: Sancetta, A., & Satchell, S. (2004). The Bernstein Copula and Its Applications to Modeling and Approximations of Multivariate Distributions. Econometric Theory, 20(3), 535–562.

  • Nonparametric copula estimation: Nagler, T., Schellhase, C. & Czado, C. (2017). Nonparametric estimation of simplified vine copula models: comparison of methods. Dependence Modeling, 5(1), 99-120.

  • OpenTURNS: Baudin, M., Lebrun, R., Iooss, B., Popelin, A.L. (2017). OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation.

  • Wind-waves environmental dataset: The data was generated by a numerical model from ANEMOC (Digital Atlas of Ocean and Coastal Sea States, see http://anemoc.cetmef.developpement-durable.gouv.fr/)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

copulogram-0.0.4.tar.gz (19.2 kB view hashes)

Uploaded Source

Built Distribution

copulogram-0.0.4-py3-none-any.whl (20.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page