An Implementation of Component-wise Peak Finding Clustering Method

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

CPFcluster

An implementation of the Component-wise Peak-Finding (CPF) clustering method, presented in 'Scalable and Adaptable Density-Based Clustering using Level Set and Mode-Seeking Methods'.

Dependencies

CPFcluster supports Python 3, with numpy, scipy, itertools, multiprocessing and scikit-learn. These should be linked with a BLAS implementation (e.g., OpenBLAS, ATLAS, Intel MKL).

Installation

CPFcluster is available on PyPI, the Python Package Index.

$ pip install CPFcluster

How To Use

To use CPFcluster, first import the CPFcluster module.

    from CPFcluster import CPFcluster

Clustering a Dataset

A CPFcluster object is constructed using the fit method, which returns a clustering of a dataset.

    CPF = CPFcluster(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
    CPF.fit(X)

CPFcluster takes 6 arguments:

k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
rho (Defaults to 0.4) Parameter used in threshold for center selection.
alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
n_jobs (Defaults to 1) Number of cores for program to execute on.
remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFcluster object is then fit to a dataset:

X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.

The result object further contains:

CCmat An n-by-n sparse matrix representation of the k-NN graph.
components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
peaks A vector containing the index of the peaks selected as cluster centers.
memberships The final cluster labelings.

CPFmatch for Multi-Image Matching

CPFmatch is the modified version of CPF applicable for the multi-image matching problem. To use CPFmatch, first import the CPFmatch module.

    from CPFcluster import CPFmatch

Clustering a Dataset

A CPFmatch object is constructed using the fit method, which returns a clustering of a dataset.

    match = CPFmatch(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
    match.fit(X, img_label)

CPFmatch takes the same 6 arguments as CPFcluster:

k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
rho (Defaults to 0.4) Parameter used in threshold for center selection.
alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
n_jobs (Defaults to 1) Number of cores for program to execute on.
remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFmatch object is then fit to a dataset with the label of the images included also:

X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.
img_label An n-by-1 numpy.ndarray with the image label for each feature. The rows correspond to n keypoints, and no two keypoints from the same image will be clustered together.

The result object further contains as before:

CCmat An n-by-n sparse matrix representation of the k-NN graph.
components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
peaks A vector containing the index of the peaks selected as cluster centers.
memberships The final cluster labelings.

Tests

CPFcluster

CPFcluster has an MIT License.

See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Download

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

2.0

Oct 4, 2021

1.0.0

Feb 10, 2021

0.8.0

Feb 10, 2021

0.7.0

Feb 10, 2021

0.6.0

Feb 10, 2021

0.5.0

Feb 10, 2021

0.4.0

Feb 9, 2021

0.3.0

Feb 7, 2021

0.2.0

Sep 9, 2020

0.1.0

Sep 7, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CPFcluster-2.0.tar.gz (6.5 kB view hashes)

Uploaded Oct 4, 2021 Source

Hashes for CPFcluster-2.0.tar.gz

Hashes for CPFcluster-2.0.tar.gz
Algorithm	Hash digest
SHA256	`0ce152cc040cab8cd3da789ce530748c67dab32bd5d82b82f512a1b7b0351550`
MD5	`9463bf23765a2d91f075fb6278cfe1af`
BLAKE2b-256	`6032590663ce4b0a91bc1d8ecd4893f1fe8cb7ae3d27ce99d58eb1d669fc3c26`