Skip to main content

Efficient calculation of phylogenetic distance matrices.

Project description

🌲 PhyloDM

PyPI BioConda Crates DOI

Efficient calculation of pairwise phylogenetic distance matrices.

PhyloDM is a high-performance library that converts a phylogenetic tree into pairwise distance matrix. It is designed to run on use minimal memory (<100 MB), and takes seconds to compute large trees (>20,000 taxa), whereas other libraries may take hours and use hundreds of GB of memory.

PhyloDM is written in Rust and is exposed to Python via the Python PyO3 API. This means it can be used in either Python or Rust, however, the documentation is written for use in Python.

⚙ Installation

Requires Python 3.7+

Conda (recommended)

conda install -c b bioconda phylodm

PyPI (alternative)

Pre-compiled binaries are packaged for most 64-bit platforms running Python 3.9 and 3.10. If you are running a different Python version, then you need to have Rust installed to compile the binaries.

python -m pip install phylodm

🐍 Quick-start

A pairwise distance matrix can be created from either a Newick file, or DendroPy tree.

from phylodm import PhyloDM

# PREPARATION: Create a test tree
with open('/tmp/newick.tree', 'w') as fh:
    fh.write('(A:4,(B:3,C:4):1);')

# 1a. From a Newick file
pdm = PhyloDM.load_from_newick_path('/tmp/newick.tree')

# 1b. From a DendroPy tree
import dendropy
tree = dendropy.Tree.get_from_path('/tmp/newick.tree', schema='newick')
pdm = PhyloDM.load_from_dendropy(tree)

# 2. Calculate the PDM
dm = pdm.dm(norm=False)
labels = pdm.taxa()

"""
/------------[4]------------ A
+
|          /---------[3]--------- B
\---[1]---+
           \------------[4]------------- C
           
labels = ('A', 'B', 'C')
    dm = [[0. 8. 9.]
          [8. 0. 7.]
          [9. 7. 0.]]
"""

Accessing data

The dm method generates a symmetrical NumPy matrix and returns a tuple of keys in the matrix row/column order.

# Calculate the PDM
dm = pdm.dm(norm=False)
labels = pdm.taxa()

"""
/------------[4]------------ A
+
|          /---------[3]--------- B
\---[1]---+
           \------------[4]------------- C
           
labels = ('A', 'B', 'C')
    dm = [[0. 8. 9.]
          [8. 0. 7.]
          [9. 7. 0.]]
"""

# e.g. The following commands (equivalent) get the distance between A and B
dm[0, 1]  # 8
dm[labels.index('A'), labels.index('B')]  # 8

Normalisation

If the norm argument of dm is set to True, then the data will be normalised by the sum of all edges in the tree.

⏱ Performance

Tests were executed using scripts/performance/Snakefile on an Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz.

For large numbers of taxa it is beneficial to use PhyloDM, however, if you have a small number of taxa in the tree it is beneficial to use DendroPy for the great features it provides.

Using PhyloDM for a large number of taxa, you can expect to use:

  • Memory (GB) = 1.4863970739600885e-08 x^2 + 1.730990617342909e-06 x + 0.014523447553823836
  • Time (minutes) = 9.496032656158468e-10 x^2 + -3.7621666288523445e-06 x + 0.012201564275114034

PhyloDM vs DendroPy resource usage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phylodm-2.1.1.tar.gz (25.3 kB view hashes)

Uploaded Source

Built Distributions

phylodm-2.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

phylodm-2.1.1-cp310-cp310-macosx_10_9_x86_64.whl (332.8 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

phylodm-2.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

phylodm-2.1.1-cp39-cp39-macosx_10_9_x86_64.whl (332.8 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page