Cluster_Ensembles

A package for determining the consensus clustering from an ensemble of partitions

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

A package for combining multiple partitions into a consolidated clustering. The combinatorial optimization problem of obtaining such a consensus clustering is reformulated in terms of approximation algorithms for graph or hyper-graph partitioning.

Installation

Cluster_Ensembles is written in Python and in C. You need Python 2.7, its Standard Library and the following packages: * NumPy (version 1.9.0 or any ulterior version); * SciPy * scikit-learn * setuptools * PyTables

As yet another prelimiary to running Cluster_Ensembles, you should also follow the few more instructions below.

On CentOS, Fedora or some Red Hat Linux distribution: * open a terminal console; * type in: sudo dnf install glibc.i686.

This will install the GNU C library that is required to run a 32-bit executable binary with a 64-bit Linux kernel. This executable is tasked with hyper-graph partitioning. Skipping this step would result in a bad ELF interpreter error message when subsequently trying to run the Cluster_Ensembles package.

On a Debian or Ubuntu platform, the following commands should yield the same outcome: * open a terminal console; * type in: sudo dpkg --add-architecture i386 to add the i386 architecture; * enter: sudo apt-get install libc6:i386.

Upon completion of the steps outlined above, install Cluster_Ensembles by sending a request to the Python Package Index (PyPI) as follows: * open a terminal console; * enter pip install Cluster_Ensembles.

Any missing third-party dependency should be automatically resolved. Please note that as part of the installation of this package, some code written in C that will later on be required by the Cluster_Ensembles package to determine a graph partition is automatically compiled under the hood and according to the specifications of your machine. You therefore need to ensure availability of CMake and GNU make on your operating system.

Usage

>>> import numpy as np
>>> import Cluster_Ensembles as CE
>>> cluster_runs = np.random.randint(0, 50, (50, 15000))
>>> consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

References

Giecold, G., Marco, E., Trippa, L. and Yuan, G.-C., “Robust Inference of Cell Lineages”, to appear
A. Strehl and J. Ghosh, “Cluster Ensembles - A Knowledge Reuse Framework for Combining Multiple Partitions”. In: Journal of Machine Learning Research, 3, pp. 583-617. 2002

IMPORTANT NOTICE

A more detailed README file and expanded docstrings will be posted soon.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.16

Dec 23, 2015

This version

1.15

Dec 9, 2015

1.14

Dec 8, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Cluster_Ensembles-1.15.tar.gz (5.3 MB view hashes)

Uploaded Dec 9, 2015 Source

Hashes for Cluster_Ensembles-1.15.tar.gz

Hashes for Cluster_Ensembles-1.15.tar.gz
Algorithm	Hash digest
SHA256	`5f240c46805e5f55d89f94a0a5c636c710e03d86d1b5985ef05fe46a29669ef5`
MD5	`ddc47c08dabaee19dde4773fb5b56386`
BLAKE2b-256	`8fd1984bc7459c61213f319eb3b56e44d85fec220db93bc5d7bf1f515a16f34c`