Skip to main content

Griddify high-dimensional tabular data for easy visualization and deep learning

Project description

Griddify

Redistribute tabular data into a grid for easy visualization and image-based deep learning. This library is greatly inspired by the excellent MolMap library.

Installation

git clone https://github.com/ersilia-os/griddify.git
cd griddify
pip install -e .

Step by step

Get a multidimensional dataset and preprocess it

In this example, we will use a dataset of 200 physicochemical descriptors calculated for about 10k compounds. You can get these data with the following command.

from griddify import datasets

data = datasets.get_compound_descriptors()

It is important that you preprocess your data (impute missing values, normalize, etc.). We provide functionality to do so.

from griddify import Preprocessing

pp = Preprocessing()
pp.fit(data)
data = pp.transform(data)

Create a 2D cloud of data features

Start by calculating distances between features.

from griddify import FeatureDistances

fd = FeatureDistances(metric="cosine").calculate(data)

You can now obtain a 2D cloud of your data features. By default, UMAP is used.

from griddify import Tabular2Cloud

tc = Tabular2Cloud()
tc.fit(fd)
Xc = tc.transform(fd)

It is always good to inspect the resulting projection. The cloud contains as many points as features exist in your dataset.

from griddify.plots import cloud_plot

cloud_plot(Xc)

Rearrange the 2D cloud onto a grid

Distribute cloud points on a grid using a linear assignment algorithm.

from griddify import Cloud2Grid

cg = Cloud2Grid()
cg.fit(Xc)
Xg = cg.transform(Xc)

You can check the rearrangement with an arrows plot.

from griddify.plots import arrows_plot

arrows_plot(Xc, Xg)

To continue with the next steps, it is actually more convenient to get mappings as integers. The following method gives you the size of the grid as well.

mappings, side = cg.get_mappings(Xc)

Rearrange your flat data points into grids

Let's go back to the original tabular data. We want to transform the input data, where each data sample is represented with a one-dimensional array, into an output data where each sample is represented with an image (i.e. a two-dimensional grid). Please ensure that data are normalize or scaled.

from griddify import Flat2Grid

fg = Flat2Grid(mappings, side)
Xi = fg.transform(data)

Explore one sample.

from griddify.plots import grid_plot

grid_plot(Xi[0])

Full pipeline

You can run the full pipeline described above in only a few lines of code.

from griddify import datasets
from griddify import Griddify

data = datasets.get_compound_descriptors()

gf = Griddify(preprocess=True)
gf.fit(data)
Xi = gf.transform(data)

You can find more examples as Jupyter Notebooks in the notebooks folder.

Learn more

The Ersilia Open Source Initiative is on a mission to strenghten research capacity in low income countries. Please reach out to us if you want to contribute: hello@ersilia.io

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

griddify-0.0.1.tar.gz (9.7 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page