Skip to main content

Hierarchical Uniform Manifold Approximation and Projection

Project description

.. -- mode: rst --

|conda_version|_ |conda_downloads|_ |pypi_version|_ |pypi_downloads|_

.. |pypi_version| image:: https://img.shields.io/pypi/v/humap.svg .. _pypi_version: https://pypi.python.org/pypi/humap/

.. |pypi_downloads| image:: https://pepy.tech/badge/humap .. _pypi_downloads: https://pepy.tech/project/humap

.. |conda_version| image:: https://anaconda.org/conda-forge/humap/badges/version.svg .. _conda_version: https://anaconda.org/conda-forge/humap

.. |conda_downloads| image:: https://anaconda.org/conda-forge/humap/badges/downloads.svg .. _conda_downloads: https://anaconda.org/conda-forge/humap

.. image:: images/humap-2M.gif :alt: HUMAP exploration on Fashion MNIST dataset

===== HUMAP

Hierarchical Manifold Approximation and Projection (HUMAP) is a technique based on UMAP <https://github.com/lmcinnes/umap/>_ for hierarchical dimensionality reduction. HUMAP allows to:

  1. Focus on important information while reducing the visual burden when exploring huge datasets;
  2. Drill-down the hierarchy according to information demand.

The details of the algorithm can be found in our paper on ArXiv <https://arxiv.org/abs/2106.07718>_. This repository also features a C++ UMAP implementation.


Installation

HUMAP was written in C++ for performance purposes, and provides an intuitive Python interface. It depends upon common machine learning libraries, such as scikit-learn and NumPy. It also needs the pybind11 due to the interface between C++ and Python.

Requirements:

  • Python 3.6 or greater
  • numpy
  • scipy
  • scikit-learn
  • pybind11
  • pynndescent (for reproducible results)
  • Eigen (C++)

If you have these requirements installed, use PyPI:

.. code:: bash

pip install humap

Alternatively (and preferable), you can use conda to install:

.. code:: bash

conda install humap

If using pip:

HUMAP depends on Eigen <https://eigen.tuxfamily.org/>_. Thus, make it sure to place the headers in /usr/local/include if using Unix or C:\Eigen if using Windows.

Manual installation:

For manually installing HUMAP, download the project and proceed as follows:

.. code:: bash

python setup.py bdist_wheel

.. code:: bash

pip install dist/humap*.whl

Usage examples

The simplest usage of HUMAP is as it follows:

Fitting the hierarchy

.. code:: python

import humap
from sklearn.datasets import fetch_openml


X, y = fetch_openml('mnist_784', version=1, return_X_y=True)

# build a hierarchy with three levels
hUmap = humap.HUMAP([0.2, 0.2])
hUmap.fit(X, y)

# embed level 2
embedding2 = hUmap.transform(2)

Refer to notebooks/ for complete examples.

C++ UMAP implementation

You can also fit a one-level HUMAP hierarchy, which essentially fits UMAP projection.

.. code:: python

umap_reducer = humap.UMAP()
embedding = umap_reducer.fit_transform(X)

Citation

Please, use the following reference to cite HUMAP in your work:

.. code:: bibtex

@misc{marciliojr_humap2021,
  title={HUMAP: Hierarchical Uniform Manifold Approximation and Projection}, 
  author={Wilson E. Marcílio-Jr and Danilo M. Eler and Fernando V. Paulovich and Rafael M. Martins},
  year={2021},
  eprint={2106.07718},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

License

HUMAP follows the 3-clause BSD license and it uses the open-source NNDescent implementation from EFANNA <https://github.com/ZJULearning/efanna>. It also uses a C++ implementation of UMAP <http://github.com/lmcinnes/umap> for embedding hierarchy levels.

E-mail me (wilson_jr at outlook.com) if you like to contribute.

......

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humap-0.2.8.tar.gz (24.2 MB view hashes)

Uploaded Source

Built Distribution

humap-0.2.8-cp38-cp38-macosx_11_0_arm64.whl (309.0 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page