Skip to main content

Noise contrastive data visualization

Project description

Conda PyPI GitHub Build Status

ncvis

NCVis is an efficient solution for data visualization and dimensionality reduction. It uses HNSW to quickly construct the nearest neighbors graph and a parallel (batched) approach to build its embedding. Efficient random sampling is achieved via PCGRandom. Detailed application examples can be found here.

Why NCVis?

It is Fast

We use preprocessed samples from the News Headlines Of India dataset to perform the comparison. Test cases are generated by taking the first 1000, 2 · 1000, . . . , 2¹⁰ · 1000 samples from the dataset. Given the same amount of time NCVis allows to process more than double number of samples compared to other methods, visualizing 10⁶ points in only 6 minutes (12 × Intel® CoreTM i7-8700K CPU @ 3.70GHz, 64 Gb RAM).

Speed Comparison

It is Efficient

One can define efficiency as the ratio of the time to execute the task on a single processor to the time on multiple processors. Ideally, the efficiency should be equal to the num- ber of threads. NCVis does not achieve this limit but signifi- cantly outperforms other methods. We used 10000 samples from the News Headlines Of India dataset.

Efficiency Comparison

It is Predictable

It is important that the proposed method has predictable behavior on simple datasets. We used the Optical Recognition of Handwritten Digits Data Set which comprised 5620 preprocessed handwritten digits and thus has a simple structure that is assumed to be revealed by visualization. NCVis shows the behavior consistent with classical methods like t-SNE while producing visualization up to the order of magnitude faster.

t-SNE (29.5s) FIt-SNE (17.4s)
t-SNE FIt-SNE
Multicore t-SNE (14.3s) LargeVis (9.7s)
Multicore t-SNE LargeVis
Umap (7.5s) NCVis (0.9s)
Umap NCVis

Using

import ncvis

vis = ncvis.NCVis()
Y = vis.fit_transform(X)

More detailed examples can be found here.

Installation

Conda [recommended]

You do not need to setup the environment if using conda, all dependencies are installed automatically.

$ conda install alartum::ncvis 

Pip [not recommended]

Important: be sure to have a compiler with OpenMP support. GCC has it by default, which is not the case with clang. You may need to install llvm-openmp library beforehand.

  1. Install numpy and cython packages (compile-time dependencies):
    $ pip install numpy cython
    
  2. Install ncvis package:
    $ pip install ncvis
    

From source [not recommended]

Important: be sure to have OpenMP available.

First of all, download the pcg-cpp and hnswlib libraries:

$ make libs

Python Wrapper

If conda environment is used, it replaces library search paths. To prevent compilation errors, you either need to use compilers provided by conda or switch to pip and system compilers.

  • Conda

    $ conda install conda-build numpy cython scipy
    $ conda install -c conda-forge cxx-compiler c-compiler
    $ conda-develop -bc .
    
  • Pip

    $ pip install numpy cython
    $ make wrapper
    

You can then use pytest to run some basic checks

$ pytest -v recipe/test.py

C++ Binary

  • Release

    $ make ncvis
    
  • Debug

    $ make debug
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncvis-1.5.6.tar.gz (275.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page