k-Medoids Clustering in Python with FasterPAM
Project description
k-Medoids Clustering in Python with FasterPAM
This python package implements k-medoids clustering with PAM and variants of clustering by direct optimization of the (Medoid) Silhouette. It can be used with arbitrary dissimilarites, as it requires a dissimilarity matrix as input.
This software package has been introduced in JOSS:
Erich Schubert and Lars Lenssen
Fast k-medoids Clustering in Rust and Python
Journal of Open Source Software 7(75), 4183
https://doi.org/10.21105/joss.04183 (open access)
For further details on the implemented algorithm FasterPAM, see:
Erich Schubert, Peter J. Rousseeuw
Fast and Eager k-Medoids Clustering:
O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms
Information Systems (101), 2021, 101804
https://doi.org/10.1016/j.is.2021.101804 (open access)
an earlier (slower, and now obsolete) version was published as:
Erich Schubert, Peter J. Rousseeuw:
Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms
In: 12th International Conference on Similarity Search and Applications (SISAP 2019), 171-187.
https://doi.org/10.1007/978-3-030-32047-8_16
Preprint: https://arxiv.org/abs/1810.05691
This is a port of the original Java code from ELKI to Rust. The Rust version is then wrapped for use with Python.
For further details on medoid Silhouette clustering with FasterMSC, see:
Lars Lenssen, Erich Schubert:
Clustering by Direct Optimization of the Medoid Silhouette
In: 15th International Conference on Similarity Search and Applications (SISAP 2022)
https://doi.org/10.1007/978-3-031-17849-8_15
If you use this code in scientific work, please cite above papers. Thank you.
Documentation
Full python documentation is included, and available on python-kmedoids.readthedocs.io
Installation
Installation with pip or conda
Pre-built packages for many Linux, Windows, and OSX systems are available
in PyPI and
conda-forge
can be installed with pip install kmedoids
respectively
conda install -c conda-forge kmedoids
.
On uncommon architectures, you may need to first
install Cargo
(i.e., the Rust programming language) first, and a subsequent
pip install kmedoids
will try to compile the package for your CPU architecture and operating system.
Compilation from source
You need to have Python 3 installed.
Unless you already have Rust, install Rust/Cargo.
Installation uses maturin for compiling and installing the Rust extension. Maturin is best used within a Python virtual environment:
# activate your desired virtual environment first, then:
pip install maturin
git clone https://github.com/kno10/python-kmedoids.git
cd python-kmedoids
# build and install the package:
maturin develop --release
Integration test to validate the installation.
pip install numpy
python -m unittest discover tests
This procedure uses the latest git version from https://github.com/kno10/rust-kmedoids.
If you want to use local modifications to the Rust code, you need to provide the source folder of the Rust module in Cargo.toml
by setting the path=
option of the kmedoids
dependency.
Example
Given a distance matrix distmatrix
, cluster into k = 5
clusters:
import kmedoids
c = kmedoids.fasterpam(distmatrix, 5)
print("Loss is:", c.loss)
Using the sklearn-compatible API
Note that KMedoids defaults to the "precomputed"
metric, expecting a pairwise distance matrix.
If you have sklearn installed, you can also use metric="euclidean"
and other distances supported by sklearn.
import kmedoids
km = kmedoids.KMedoids(5, method='fasterpam')
c = km.fit(distmatrix)
print("Loss is:", c.inertia_)
MNIST (10k samples)
import kmedoids, numpy, time
from sklearn.datasets import fetch_openml
from sklearn.metrics.pairwise import euclidean_distances
X, _ = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
X = X[:10000]
diss = euclidean_distances(X)
start = time.time()
fp = kmedoids.fasterpam(diss, 100)
print("FasterPAM took: %.2f ms" % ((time.time() - start)*1000))
print("Loss with FasterPAM:", fp.loss)
start = time.time()
pam = kmedoids.pam(diss, 100)
print("PAM took: %.2f ms" % ((time.time() - start)*1000))
print("Loss with PAM:", pam.loss)
Implemented Algorithms
- FasterPAM (Schubert and Rousseeuw, 2020, 2021)
- FastPAM1 (Schubert and Rousseeuw, 2019, 2021)
- PAM (Kaufman and Rousseeuw, 1987) with BUILD and SWAP
- Alternating optimization (k-means-style algorithm)
- Silhouette index for evaluation (Rousseeuw, 1987)
- FasterMSC (Lenssen and Schubert, 2022)
- FastMSC (Lenssen and Schubert, 2022)
- PAMSIL (Van der Laan and Pollard, 2003)
- PAMMEDSIL (Van der Laan and Pollard, 2003)
- Medoid Silhouette index for evaluation (Van der Laan and Pollard, 2003)
Note that the k-means-like algorithm for k-medoids tends to find much worse solutions.
Contributing to python-kmedoids
Third-party contributions are welcome. Please use pull requests to submit patches.
Reporting issues
Please report errors as an issue within the repository's issue tracker.
Support requests
If you need help, please submit an issue within the repository's issue tracker.
License: GPL-3 or later
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for kmedoids-0.4.2-cp311-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4f3993c9105ca485b56129dac3e51b0ac3fcf212dab1ef245f474d6f34ce5f3 |
|
MD5 | 18e8137d661db3945bbe9c7047ed03ea |
|
BLAKE2b-256 | cc47400738850e0545a46766869e9c19af28d78bb352dc668b8c5ece1fc7ae48 |
Hashes for kmedoids-0.4.2-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e2fced11bc7623699e30a254a7b11e3c66d7c36c3059da9029505b7aaf23a90 |
|
MD5 | cbbface4ca74c1c7feab85d498910291 |
|
BLAKE2b-256 | bd69db4006b6e5fdd945bf360cd0e669e82d8a6cbb6cd3c3cf8300fb46fe5f09 |
Hashes for kmedoids-0.4.2-cp311-cp311-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f8090e66987265a2adbd2d607a95445077168d05be0d7af3d8fb9fc324f5582 |
|
MD5 | 7df53f8b23eb17f426f8c01a92d92968 |
|
BLAKE2b-256 | b1bd6fd39a6947673f82daf26486aaa5c3a9ee85dcd4428d298329cecb04890c |
Hashes for kmedoids-0.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60cdda763d6c01b891affd45ddd539665daa1c1baca45eff6d0cd2b816f7d2be |
|
MD5 | 0f1f3e3929f3a3d4ac188e4c92fa8254 |
|
BLAKE2b-256 | 28bba4debd0ddf3bce0677a6131617143000cde4f2d82879927591fb375bfc81 |
Hashes for kmedoids-0.4.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 351d52ddcc69f0f28cd032776ce51cd1a09cecd554c41a1c61cb96f8a704a258 |
|
MD5 | daefd36ec6762a0695e83b005822ac0a |
|
BLAKE2b-256 | 71334fa32727c74cd1110606f236f5b03cc95a79f2947c036741ac99e22ffd7c |
Hashes for kmedoids-0.4.2-cp311-cp311-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3d0416e65f06b810bbbeb1cb83219909fcd10c3b86de158c380b2d0979ea5cb |
|
MD5 | d8021f27d67b25b409837c0b49d1b8ff |
|
BLAKE2b-256 | 2b782cd1198d86e6e60d7c82affa1d48b328d681f8db68c1bdc012ea10e1e710 |
Hashes for kmedoids-0.4.2-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 783f2257c8b26070c4df29f203d9251ba2fc7498d513e7bcfe98870106d0d946 |
|
MD5 | 7c5ae5cf6c6ca440af1d200d15505bf1 |
|
BLAKE2b-256 | 1660f418d23be2849427b9a04326c640b788280eddd1e4c22e01807c65e0eaf7 |
Hashes for kmedoids-0.4.2-cp310-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ee0b027248257f7bbe622398946dc366895258a2ec420e653761f02df3e45fba |
|
MD5 | da574d8bec8143fe1067983c7db45232 |
|
BLAKE2b-256 | 41fcaa170f01222df236511845547c9e015ab4a08f2c2bb3dbac87857df1a564 |
Hashes for kmedoids-0.4.2-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c361873d11f076c8b17fa8cd29b3f197c39eace5265ec372ee20fc1b5a5ee2d3 |
|
MD5 | 1047438500da4ff13ab5015855f3ce10 |
|
BLAKE2b-256 | 10504d187b2f90a68d9c4d8cb1c76dc757380ddb8c9c7d44d04f0002db5c746d |
Hashes for kmedoids-0.4.2-cp310-cp310-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60920362f6fb0c9085b2249aa40ba38cb3d43f01d36aa9ae8d7310bb10ceff21 |
|
MD5 | 614e0c8bdcef969040027694e9e0a7e0 |
|
BLAKE2b-256 | 21dda9bbb15f33efa7f1296b323d22bbee5fd5596c4d802456887778fdb285a0 |
Hashes for kmedoids-0.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1dfc86ba48396409ee5d972b0b243ad19974d3cc554b8121ea4d0726bcfc3737 |
|
MD5 | 62e1be7f7328a92a5aa134811155c2a2 |
|
BLAKE2b-256 | 7e201fb6aa31ea193329d711937a270ae03ca9b57aec6d0f2cfa33ea3a39a4b1 |
Hashes for kmedoids-0.4.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b327171a5470fd6c2c480170b7425e91f392fca26397f0cfe6859c155560466 |
|
MD5 | 39be070a31a7fd27949a1219dd07ca92 |
|
BLAKE2b-256 | 850548dcff8b6db174d62465b604c55315d1031823ac4e107a0a5cbc7343348e |
Hashes for kmedoids-0.4.2-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de5716e56fe4265bd55925244d7aa42529e4149439eb7c533d975cf03ef7586b |
|
MD5 | cd805cedbb5acb5cfcb6a63dd72eb005 |
|
BLAKE2b-256 | 3e8d33feaa038ed5195cd03359664cbebf15e947022af7399de68972b0275ea3 |
Hashes for kmedoids-0.4.2-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0acf9af2421f6cf1c70d4576cec0013e152d708c0b4082b9fde90a95e78deabc |
|
MD5 | d2babfabfee5a86c654ff5ac4566e2a9 |
|
BLAKE2b-256 | 71194134d526f01caddc866f0235628572be3a13e77ce83479a219303880171f |
Hashes for kmedoids-0.4.2-cp39-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb103099c2af6d748f0eb2804813daa245b327bc8a53f6dd1f60bfe40b05bfde |
|
MD5 | d959f0caa51a0cc36f62a01fdf2dacce |
|
BLAKE2b-256 | cc95e811f23567b923880ce5c148d9b558d9da21c38497bdf2fbbcac34afa42c |
Hashes for kmedoids-0.4.2-cp39-cp39-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1f9bba7a787f20e80b42164febc993b8ee24bd38b34c38a2a92909b45b2b36a |
|
MD5 | a24cd2fc06452b1b9b08811a432796d1 |
|
BLAKE2b-256 | 7888ee85dae27bf1e58959dd6cf325c6dd71ffdce09958369be67d4ec1f46037 |
Hashes for kmedoids-0.4.2-cp39-cp39-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18f48319a88f8e27c138ab3e9113548f2a01da2e649df9f1da3f58e18a5ec224 |
|
MD5 | a90895150e6db1460d42759282b11d9c |
|
BLAKE2b-256 | c1f97045586b91e8731758b71ba0d7dfcf9be18769999197264011f50891909a |
Hashes for kmedoids-0.4.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 469e628fd284429ad4b7c05c422af9942cb1bad420cf6b922261a8d2ed6c5880 |
|
MD5 | fa7b4ebad65494c17cd5c47e1e37739f |
|
BLAKE2b-256 | cf1979b45bac5051a34fae76a63b46d6f3d03191b54cd715eb171d1942244b3d |
Hashes for kmedoids-0.4.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9ef2911a1724363ff7b0d932ecb81f7bcac2ea141553cc620a52b515c105dda |
|
MD5 | 1ebe03399ffddb0e2a976cd9e6170d42 |
|
BLAKE2b-256 | ed0566c36a6a7d29e69ede331cd82940e01778823c4ac2218209a9ead9cdb985 |
Hashes for kmedoids-0.4.2-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61c9eebaabadc992f50dfa3a0dd42708a41e9cb2c9ef5ab059a5a50bc511525f |
|
MD5 | d17cf01dc908fb37a4ae9d690613896b |
|
BLAKE2b-256 | a42d47fbb09f88df51f1f9eaf2438129cad2be841f5b5f70b75901ff64bf9f2b |
Hashes for kmedoids-0.4.2-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eb349ea8655a7e03adadf5d3b48da2a45bd818d75df875d760a7ed0961313bb |
|
MD5 | dbac4f1c7f720d506783f48a8f27ad29 |
|
BLAKE2b-256 | acf96b1498d52e2e42f93aec0168377e0639c1b8e229f943e203639b472cab56 |
Hashes for kmedoids-0.4.2-cp38-none-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 33efb522ad559eda52dde768e917f6a7bbcf594de282766e25a4b447e7759f00 |
|
MD5 | 73b8f248c50408c864f247533e7dbc19 |
|
BLAKE2b-256 | c73efc79ad961dd68093bf6fc1c093ddf15037bbd12529be14cb1d8769faf598 |
Hashes for kmedoids-0.4.2-cp38-cp38-manylinux_2_28_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed16cc5823f88083a6d6c5777164d07bca2e407fe666a0894df30e7b995d1982 |
|
MD5 | cc17396a696debbdbce200418d8b7f41 |
|
BLAKE2b-256 | cecd824f1b4455445393bc75d165c8e7256b58150f900c946e8846365aba23dc |
Hashes for kmedoids-0.4.2-cp38-cp38-manylinux_2_28_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8588a97cfb8c2633486c00278d7f9817c0bde39fd8eee78d3c9a8ded09ac3e8d |
|
MD5 | 24aef658dc797078947ca243e48f719c |
|
BLAKE2b-256 | ab1826af869087d33dc1f43dbda0c5176a1735f9369670ee9bf6116298964505 |
Hashes for kmedoids-0.4.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6b47f3c22e052e3755b5d5e2ed1441f13f102a975c73d5f9bf5ebef98da5941 |
|
MD5 | 26563ec8c46fae02969e119ea5ac0215 |
|
BLAKE2b-256 | 7893d1306110aa3e2669750ba8d94b1b3791183f6df718c84cdce884f27c49df |
Hashes for kmedoids-0.4.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b628c1618817e86b6822a95fe04a80a379a75eae1a8e588144368e7d5b1178c5 |
|
MD5 | 39253af560dac1774428f44e3173cd37 |
|
BLAKE2b-256 | 62dc4ad81e4e6c30ee04b49ce65319836880fd90f085d9e38e25f2620c8e6aeb |
Hashes for kmedoids-0.4.2-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a79c1fbee8d031509110e494e31a57dbda29ca469c983ef49974e6938a5c608f |
|
MD5 | 1247daf84196c3d8406024b9a1795c07 |
|
BLAKE2b-256 | 9cd1c391c8d0d93967a559f5d0270de28d976f7daa72b60a377f37e7a441e2fc |
Hashes for kmedoids-0.4.2-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84b159aeb51cf5f859f4198709bd702e332b76f3eb989a47314b429f7b6a25db |
|
MD5 | a70ce8df7cfe2aef4e8c08c0479354c9 |
|
BLAKE2b-256 | 4b0f864b70ff9ca8f0f69b8bdc29f91774286e01ceee0449091b9ef2a5761cc9 |