Skip to main content

Disjoint-Set implementation with heuristic optimizations (union-by-rank and path-compression)

Project description

Python Version License

A Disjoint-Set implementation which makes use of the the “Union by Rank” and “Path Compression” heuristic optimizations while being as simple and effective as possible.

It support the following operations:

  • Find-Set in O(1) amortized running time

  • Union in O(1) amortized running time

Check end notes for additional information about running times

Why?

Sometimes, specially during algorithm competitions, a Union-Find data structure is needed to solve a certain problem (e.g. when implementing Kruskal’s MST algorithm) In those cases, we usually use whatever off-the-shelf implementation we happen to find in PyPI and, many times, we get the infamous submission fail Time Limit Exceed(TLE) because that specific Disjoint-Set implementation is not fast enough…

Worry not! A fast and simple enough (you can copy paste the code here) disjoint-set implementation is now available:

Installation

$ pip install dset

Usage

>>> import dset
>>> s1 = dset.Set()                 # To create new sets objects (it runs in O(1))
>>> s2 = dset.Set()
>>> s3 = dset.Set()
>>> dset.groups()                   # To check the number of different groups (it runs in O(1))
3
>>> dset.find(s1) == dset.find(s2)  # To check if two sets are in the same group
False
>>> dset.union(s1, s2)              # to group s1 and s2 sets together
>>> dset.groups()                   # Now there are only 2 groups: s1-s2 and s3
2
>>> dset.find(s1) == dset.find(s2)  # They are in the same group s1-s2
True
>>> dset.find(s2) == dset.find(s3)  # They are in distinct groups s1-s2 and s3
False
>>> dset.union(s2, s1)              # do nothing (no need to check if s1 and s2 belong to the same group)
>>> dset.groups()                   # Same as before only 2: s1-s2 and s3
2
>>> dset.union(s1, s3)              # group s1-s2 and s3 together.
>>> dset.groups()                   # Now there is only one group: s1-s2-s3
1
>>> dset.find(s1) == dset.find(s2)  # All the sets are in the same groups s1-s2-s3
... dset.find(s2) == dset.find(s3)
... dset.find(s1) == dset.find(s3)
True
True
True

Future improvements

  • Make official Sphinx Documentation. (minor releases)

  • Make test suite. (minor releases)

  • Change to a list-based implementation (major-minor release)
    • Improve running time (constant terms)

    • Improve memory space (constant terms)

  • Make the package C-based. (major-minor release)
    • Improve running time (constant terms)

Notes on Complexity Running Times

Separately, either “Union by Rank” or “Path Compression” improves the running time of the operations on disjoint-sets. Alone, “Union by Rank” yields a running time of O(m log n), where m is the number of Union and Find-Set operations and n is the number of sets.

The improvement is even greater when we use the two heuristics together; the worst-case running time is O(m f(n)), where f(n) is a very slowly growing function, the Inverse Ackermann function (e.g. for n equal to the number atoms in the universe, f(n) is only 4) So for any conceivable application, we can consider the running time of m Union + Find-Set operations to be O(m) and therefore both Union and Find-Set operations have O(1) amortized running time in practice.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dset-0.0.1.tar.gz (5.1 kB view hashes)

Uploaded Source

Built Distribution

dset-0.0.1-py3-none-any.whl (4.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page