trimap

TriMap: Large-scale Dimensionality Reduction Using Triplets

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
- Science/Research
Operating System
- MacOS
- Microsoft :: Windows
- POSIX
- Unix
Programming Language
- C
- Python
- Python :: 3.6
Topic
- Scientific/Engineering

Project description

TriMap is a dimensionality reduction method that uses triplet constraints to form a low dimensional embedding of a set of points. The triplet constraints are of the form “point i is closer to point j than point k”. The triplets are sampled from the high-dimensional representation of the points and a weighting scheme is used to reflect the importance of each triplet.

TriMap provides a much better global view of the data than the other dimensionality reduction methods such t-SNE, LargeVis, and UMAP. The global structure includes relative distances of the clusters, multiple scales in the data, and the existence of possible outliers.

The following implementation is in Python.

How to use TriMap

TriMap has a transformer API similar to other sklearn libraries. To use TriMap with the default parameters, simply do:

import trimap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = trimap.TRIMAP().fit_transform(digits.data)

To calculate the global score, do:

gs = trimap.TRIMAP(verbose=False).global_score(digits.data, embedding)
print("global score %2.2f" % gs)

Parameters

Unlike other dimensionality reduction method, TriMap only has a few parameters to tune:

n_inliers: Number of nearest neighbors for forming the nearest neighbor triplets (default = 10).

n_outliers: Number of outliers for forming the nearest neighbor triplets (default = 5).

n_random: Number of random triplets per point (default = 5).

weight_adj: Adjust weights for extreme outliers using a log-transformation (default = 500.0).

lr: Learning rate (default = 1000.0).

n_iters: Number of iterations (default = 400).

The other parameters include:

fast_trimap: Use only ANNOY for nearest-neighbor search (default = True).

opt_method: Optimization method {‘sd’ (steepest descent), ‘momentum’ (GD with momentum), ‘dbd’ (delta-bar-delta, default)}.

verbose: Print the progress report (default = True).

return_seq: Store the intermediate results and return the results in a tensor (default = False).

An example of adjusting these parameters:

import trimap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = trimap.TRIMAP(n_inliers=10,
                          n_outliers=5,
                          n_random=5).fit_transform(digits.data)

The nearest-neighbor calculation is performed by default using ANNOY. For more accurate results, the first 5 nearest-neighbors of each point can be calculated using sklearn.neighbors.NearestNeighbors and the results can be combined with those calculated using ANNOY. However, this may significantly increase the runtime. The fast_trimap (default = True) argument controls this property. For more accurate results, set fast_trimap = False.

Examples

The following are some results on real-world datasets. The values of nearest-neighbor accuracy and global score are shown as a pair (NN, GS) on top of each figure. For more results, please refer to our paper.

USPS Handwritten Digits (n = 11,000, d = 256)

20 News Groups (n = 18,846, d = 100)

Visualizations of the 20 News Groups dataset

Tabula Muris (n = 53,760, d = 23,433)

Visualizations of the Tabula Muris Mouse Tissues dataset

MNIST Handwritten Digits (n = 70,000, d = 784)

Fashion MNIST (n = 70,000, d = 784)

Visualizations of the Fashion MNIST dataset

TV News (n = 129,685, d = 100)

Runtime of t-SNE, LargeVis, UMAP, and TriMap in the hh:mm:ss format on a single machine with 2.6 GHz Intel Core i5 CPU and 16 GB of memory is given in the following table. We limit the runtime of each method to 12 hours. Also, UMAP runs out of memory on datasets larger than ~4M points.

Runtime of TriMap compared to other methods

Installing

Requirements:

numpy
scikit-learn
numba
annoy

Install Options

If you have all the requirements installed, you can use pip:

sudo pip install trimap

Please regularly check for updates and make sure you are using the most recent version. If you have TriMap installed and would like to upgrade to the newer version, you can use the command:

sudo pip install --upgrade --force-reinstall trimap

An alternative is to install the dependencies manually using anaconda and using pip to install TriMap:

conda install numpy
conda install scikit-learn
conda install numba
conda install annoy
pip install trimap

For a manual install get this package:

wget https://github.com/eamid/trimap/archive/master.zip
unzip master.zip
rm master.zip
cd trimap-master

Install the requirements

sudo pip install -r requirements.txt

conda install scikit-learn numba annoy

Install the package

python setup.py install

Support and Contribution

This implementation is still a work in progress. Any comments/suggestions/bug-reports are highly appreciated. Please feel free contact me at: eamid@ucsc.edu. If you would like to contribute to the code, please fork the project and send me a pull request.

License

Please see the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
- Science/Research
Operating System
- MacOS
- Microsoft :: Windows
- POSIX
- Unix
Programming Language
- C
- Python
- Python :: 3.6
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

1.4.3.dev1 pre-release

Nov 18, 2018

1.4.2.dev1 pre-release

Sep 29, 2018

1.4.1.dev1 pre-release

Sep 29, 2018

1.4.0.dev1 pre-release

Jun 10, 2018

1.3.0.dev1 pre-release

Apr 28, 2018

1.1.4

Apr 21, 2022

1.1.3

Mar 25, 2022

1.1.2

Feb 16, 2022

1.1.1

Feb 15, 2022

1.1.0

Feb 15, 2022

1.0.15

Mar 20, 2021

1.0.14

Apr 15, 2020

1.0.13

Feb 4, 2020

1.0.12

Oct 14, 2019

1.0.11

Oct 7, 2019

1.0.10

Oct 5, 2019

1.0.9

Oct 4, 2019

This version

1.0.8

Oct 3, 2019

1.0.7

Sep 29, 2019

1.0.6

Sep 29, 2019

1.0.5

Sep 28, 2019

1.0.4

Jul 20, 2019

1.0.3

Jul 20, 2019

1.0.2

Jul 20, 2019

1.0.1

Jul 20, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trimap-1.0.8.tar.gz (10.9 kB view hashes)

Uploaded Oct 3, 2019 Source

Hashes for trimap-1.0.8.tar.gz

Hashes for trimap-1.0.8.tar.gz
Algorithm	Hash digest
SHA256	`914d743ddc45d2f8848d4043b72026c618c4ec68f45a9ef2383577ede92e297d`
MD5	`1847a9fd9991d8932f3a79bedda193ab`
BLAKE2b-256	`957b4e0ab1299bf83dc75f84c7f6b8149dac6ec0e4bb428d3054897804402d78`