Efficient calculation of phylogenetic distance matrices.
Project description
🌲 PhyloDM
Efficient calculation of pairwise phylogenetic distance matrices.
PhyloDM is a high-performance library that converts a phylogenetic tree into pairwise distance matrix. It is designed to run on use minimal memory (<100 MB), and takes seconds to compute large trees (>20,000 taxa), whereas other libraries may take hours and use hundreds of GB of memory.
PhyloDM is written in Rust and is exposed to Python via the Python PyO3 API. This means it can be used in either Python or Rust, however, the documentation is written for use in Python.
⚙ Installation
Requires Python 3.7+
Conda (recommended)
conda install -c b bioconda phylodm
PyPI (alternative)
Pre-compiled binaries are packaged for most 64-bit platforms running Python 3.9 and 3.10. If you are running a different Python version, then you need to have Rust installed to compile the binaries.
python -m pip install phylodm
🐍 Quick-start
A pairwise distance matrix can be created from either a Newick file, or DendroPy tree.
from phylodm import PhyloDM
# PREPARATION: Create a test tree
with open('/tmp/newick.tree', 'w') as fh:
fh.write('(A:4,(B:3,C:4):1);')
# 1a. From a Newick file
pdm = PhyloDM.load_from_newick_path('/tmp/newick.tree')
# 1b. From a DendroPy tree
import dendropy
tree = dendropy.Tree.get_from_path('/tmp/newick.tree', schema='newick')
pdm = PhyloDM.load_from_dendropy(tree)
# 2. Calculate the PDM
dm = pdm.dm(norm=False)
labels = pdm.taxa()
"""
/------------[4]------------ A
+
| /---------[3]--------- B
\---[1]---+
\------------[4]------------- C
labels = ('A', 'B', 'C')
dm = [[0. 8. 9.]
[8. 0. 7.]
[9. 7. 0.]]
"""
Accessing data
The dm
method generates a symmetrical NumPy matrix and returns a tuple of
keys in the matrix row/column order.
# Calculate the PDM
dm = pdm.dm(norm=False)
labels = pdm.taxa()
"""
/------------[4]------------ A
+
| /---------[3]--------- B
\---[1]---+
\------------[4]------------- C
labels = ('A', 'B', 'C')
dm = [[0. 8. 9.]
[8. 0. 7.]
[9. 7. 0.]]
"""
# e.g. The following commands (equivalent) get the distance between A and B
dm[0, 1] # 8
dm[labels.index('A'), labels.index('B')] # 8
Normalisation
If the norm
argument of dm
is set to True
, then the data will be normalised
by the sum of all edges in the tree.
⏱ Performance
Tests were executed using the scripts/phylodm_perf.py
script with 10 trials.
These tests demonstrate that PhyloDM is more efficient than DendroPy's phylogenetic distance matrix when there are over 500 taxa in the tree. If there are less than 500 taxa, then use DendroPy for all of the great features it provides.
With 10,000 taxa in the tree, each program uses approximately:
- PhyloDM = 4 seconds / 40 MB memory
- DendroPy = 17 minutes / 22 GB memory
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for phylodm-2.0.5-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1c5fdfbe91fe98224b7f4c3a9f3786c9274a2e5be50f00ec196246a206f3ac5 |
|
MD5 | 4d21af94a59a6dc259f4fa97308d1be1 |
|
BLAKE2b-256 | cd840c37d65108965cab676c23ac3dac8e6e6044f38c88caf725bf8ea8e63580 |
Hashes for phylodm-2.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5257e1107d0a05c09bf4586a0ffd71ac80b7c8097e7374f7cf4a5c0b7917b13a |
|
MD5 | 71e9d1c40b46ce59a66b99cff6629540 |
|
BLAKE2b-256 | ba4f3de76c671232298e4f1fc44065d002bcbfd5909ded616486a0000c7faaf9 |
Hashes for phylodm-2.0.5-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3ea7a3a85981c6ae02b2de481481d5a2c7b91274b5d807972d54e24c306a745 |
|
MD5 | 9b15eb38cc7172cec5f8a2ce833047d6 |
|
BLAKE2b-256 | d4c350d02ebecfd123f02cf20d2a9d88b7a52138749855b875e71bccf5e2d810 |
Hashes for phylodm-2.0.5-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 359f0cd593d8d7f6b1310753904da49ce4a97b46a0c4794871953e9fc5402355 |
|
MD5 | 6b9e0756321efe646884228dac12afdd |
|
BLAKE2b-256 | a7e1a428244a27ad36854793287f71ff23eaf15e9e1320f84e55ae5c7fbf26b0 |
Hashes for phylodm-2.0.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc186b5598ef528a297c29df8f100268926528fdda35e5500b12f3f6f782216f |
|
MD5 | 67c20a7aff98f38880bfa75dc0f70e0a |
|
BLAKE2b-256 | d8b215c3db756999c1c99961467b584aa8dab8fa80a35a58063f74ca6ee01e7c |
Hashes for phylodm-2.0.5-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3bbfc8a917366a03840fc250a0a8e8f4aba3179ca1fa0e869856ab033e5450cf |
|
MD5 | f4efd0fb7f90b37c48351f7b6b564087 |
|
BLAKE2b-256 | 044d9d3ecb040a160f3d3bfd99a86276d02fd0fe634bf47b6ae9a3533113ee2b |