Skip to main content

Optimal univariate (1D) clustering based on Ckmeans.1d.dp

Project description

CKmeans: Optimal Univariate Clustering

Ckmeans clustering is an improvement on 1-dimensional (univariate) heuristic-based clustering approaches such as Jenks. The algorithm was developed by Haizhou Wang and Mingzhou Song (2011) as a dynamic programming approach to the problem of clustering numeric data into groups with the least within-group sum-of-squared-deviations.

Minimizing the difference within groups – what Wang & Song refer to as withinss, or within sum-of-squares – means that groups are optimally homogenous within and the data is split into representative groups. This is very useful for visualization, where one may wish to represent a continuous variable in discrete colour or style groups. This function can provide groups that emphasize differences between data.

Being a dynamic approach, this algorithm is based on two matrices that store incrementally-computed values for squared deviations and backtracking indexes.

Unlike the original implementation, this implementation does not include any code to automatically determine the optimal number of clusters: this information needs to be explicitly provided. It does provide the roundbreaks method to aid labelling, however.

Implementation

This library uses the ckmeans Rust crate, by the same author.

Benchmarks

Install optional dependencies, then run benchmark.py.

ckmeans-1d-dp is about 20 % faster, but note that it only returns indices identifying each cluster to which the input belongs; if you actually want to cluster your data, you need to do that yourself which I strongly suspect might be slower overall. On the other hand, if all you want is indices it may be a better choice.

Example

from ckmeans import ckmeans
import numpy as np


data = np.array([1.0, 2.0, 3.0, 4.0, 100.0, 101.0, 102.0, 103.0])
clusters = 2
result = ckmeans(data, clusters)
assert result == [
    np.array([1.0, 2.0, 3.0, 4.0]),
    np.array([100.0, 101.0, 102.0, 103.0])
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ckmeans-0.2.5.tar.gz (9.5 kB view hashes)

Uploaded Source

Built Distributions

ckmeans-0.2.5-cp310-abi3-win_amd64.whl (142.5 kB view hashes)

Uploaded CPython 3.10+ Windows x86-64

ckmeans-0.2.5-cp310-abi3-win32.whl (133.0 kB view hashes)

Uploaded CPython 3.10+ Windows x86

ckmeans-0.2.5-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (610.0 kB view hashes)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ x86-64

ckmeans-0.2.5-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (614.6 kB view hashes)

Uploaded CPython 3.10+ manylinux: glibc 2.17+ ARM64

ckmeans-0.2.5-cp310-abi3-manylinux_2_5_i686.manylinux1_i686.whl (618.1 kB view hashes)

Uploaded CPython 3.10+ manylinux: glibc 2.5+ i686

ckmeans-0.2.5-cp310-abi3-macosx_11_0_arm64.whl (228.9 kB view hashes)

Uploaded CPython 3.10+ macOS 11.0+ ARM64

ckmeans-0.2.5-cp310-abi3-macosx_10_12_x86_64.whl (255.9 kB view hashes)

Uploaded CPython 3.10+ macOS 10.12+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page