Skip to main content

An Implementation of Component-wise Peak Finding Clustering Method

Project description

CPFcluster

An implementation of the Component-wise Peak-Finding (CPF) clustering method, presented in 'Scalable and Adaptable Density-Based Clustering using Level Set and Mode-Seeking Methods'.

Dependencies

CPFcluster supports Python 3, with numpy, scipy, itertools, multiprocessing and scikit-learn. These should be linked with a BLAS implementation (e.g., OpenBLAS, ATLAS, Intel MKL).

Installation

CPFcluster is available on PyPI, the Python Package Index.

$ pip install CPFcluster

How To Use

To use CPFcluster, first import the CPFcluster module.

    from CPFcluster import CPFcluster

Clustering a Dataset

A CPFcluster object is constructed using the fit method, which returns a clustering of a dataset.

    CPF = CPFcluster(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
    CPF.fit(X)

CPFcluster takes 6 arguments:

  • k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
  • rho (Defaults to 0.4) Parameter used in threshold for center selection.
  • alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
  • n_jobs (Defaults to 1) Number of cores for program to execute on.
  • remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
  • cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFcluster object is then fit to a dataset:

  • X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.

The result object further contains:

  • CCmat An n-by-n sparse matrix representation of the k-NN graph.
  • components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
  • ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
  • peaks A vector containing the index of the peaks selected as cluster centers.
  • memberships The final cluster labelings.

CPFmatch for Multi-Image Matching

CPFmatch is the modified version of CPF applicable for the multi-image matching problem. To use CPFmatch, first import the CPFmatch module.

    from CPFcluster import CPFmatch

Clustering a Dataset

A CPFmatch object is constructed using the fit method, which returns a clustering of a dataset.

    match = CPFmatch(k, rho, alpha, n_jobs, remove_duplicates, cutoff)
    match.fit(X, img_label)

CPFmatch takes the same 6 arguments as CPFcluster:

  • k Number of nearest-neighbors used to create connected components from the dataset and compute the density.
  • rho (Defaults to 0.4) Parameter used in threshold for center selection.
  • alpha (Defaults to 1) Optional parameter used in threshold of edge weights for center selection, not discussed in paper.
  • n_jobs (Defaults to 1) Number of cores for program to execute on.
  • remove_duplicates (Defaults to False) Option to remove duplicate rows from data in advance of clustering.
  • cutoff (Defaults to 1) Threshold for removing instances as outliers. Instances with fewer edges than the cutoff value are removed.

The CPFmatch object is then fit to a dataset with the label of the images included also:

  • X An n-by-d numpy.ndarray with training data. The rows correspond to n observations, and the columns correspond to d dimensions.
  • img_label An n-by-1 numpy.ndarray with the image label for each feature. The rows correspond to n keypoints, and no two keypoints from the same image will be clustered together.

The result object further contains as before:

  • CCmat An n-by-n sparse matrix representation of the k-NN graph.
  • components A vector containing the index of the component to which each instance belongs. If the instance is an outlying point, the value will be NaN.
  • ps A list of tuples containing the number of instances and the proportion of instances for which a point of higher density was not present in the nearest neighbours for each component.
  • peaks A vector containing the index of the peaks selected as cluster centers.
  • memberships The final cluster labelings.

Tests

CPFcluster

CPFcluster has an MIT License.

See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CPFcluster-2.0.tar.gz (6.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page