skip to navigation
skip to content

AgglomCluster 1.0.2

Performs greedy agglomerative clustering on network-x graphs

Latest Version: 2.0.4

Agglomerative clustering tool for network-x graphs


Implements the algorithm described by: “Fast algorithm for detecting community structure in networks” M. E. J. Newman. 2004 This is a greedy agglomerative hierarchical clustering algorithm. The alogorithm efficiently clusters large number of nodes (this is one of the best scaling clustering algorithms) while producing a suggested number of clusters. See papers on scaling and accuracy questions regarding greedy Newman.

This implementation uses a heap to select the best pair to cluster at each iteration - A naive implementation considers all “n” edges in the graph (O(n)) - A heap reduces this search dramatically (O(log(n))


allset – for automatic module importing networkx – supported graphing library


  • The actual Modularity score does not exactly match the Modularity score of the example on the wikipedia page
  • Does not work for directed graphs (TODO operate on the undirected graph)
  • Does not work for negative graphs (TODO add this capability)
  • Does not handle disconnected components (unless than are components of size 1)
  • Clustering needs to move to a function call rather than an object holder (return dendrogram object)
  • Node relabeling is messy
  • Dendrogram crawling is used for two separate purposes which aren’t clearly defined/called


NewmanGreedy objects store the following attributes * Supergraph - Duplicate of the original graph. Gets manipulated during the clustering: edges and nodes are condensed and reweighted - Cluster degree stored in node attribute - Number of connections between clusters stored in edge attribute (weighted edge) - Implicitly keeps track of the existing nodes which is required for the heap * Dendrogram - Stores the clustering history in a tree * Heap - Stores the modularity quality difference for combining each pair of existing nodes - Popping returns the node pair with the largest modulairty/quality difference, then smallest id1, then smallest id2 * Stored as a tuple (value, id1, id2) where the modularity/value is negated, and id1 is the smaller of the two * Processing the smaller IDs first means that smaller clusters will be chosen first during modularity tie breaking - Non-existing nodes are not actively removed from the heap, so pairs with non-existing nodes are ignored when popping * Quality History - Charts the Modularity score for each number of clustering


Author(s): Ethan Lozano & Matthew Seal

Collaborator(s): Zubin Jelveh

File Type Py Version Uploaded on Size (md5) Source 2014-06-10 20KB