Skip to main content

Reduced dimension embeddings for pathogen sequences

Project description

pathogen-embed

Create reduced dimension embeddings for pathogen sequences

pathogen-embed is an open-source software for scientists, epidemiologists, etc. to run reduced dimension embeddings (PCA, MDS, t-SNE, and UMAP) on viral populations. This is the source code from the paper Cartography written by Sravani Nanduri and John Huddleston.

Documentation
Source Code
Bug reports

Build documentation

Build the Documentation:

make -C /docs html

Clean the docs.

make -C /docs clean

Releasing a new version

Information about each file

README.md

contains the description of the package pathogen-embed.

setup.py

Gives PyPi the instructions about where to find dependent packages, the authors and relevant links, etc. Also gives the entry points for the console script, which tells Pypi to call the main function of main.py.

init.py

Initializes the package, creates the parser to parse the command line arguments and pass them into the embed.py function.

main.py

Calls the "run" function in init.py, which calls embed.py.

embed.py

The main code for the package.

To create new version

Run

python3 -m build

This creates the dist folder that gets uploaded to pypi.

python3 -m twine upload dist/*

Input the username and password, upload new dist files to pypi. Make sure the version of the dist folders does not already exist within pypi.

Installing the package

Simply install the package using pip.

pip install pathogen-embed

src.embed module

Command line interface

The full Documentation.

The below documentation does not detail the named and positional arguments.

Reduced dimension embeddings for pathogen sequences

usage: embed [-h] [--distance-matrix DISTANCE_MATRIX] [--separator SEPARATOR]
             [--alignment ALIGNMENT] [--cluster-data CLUSTER_DATA]
             [--cluster-threshold CLUSTER_THRESHOLD]
             [--random-seed RANDOM_SEED] [--output-dataframe OUTPUT_DATAFRAME]
             [--output-figure OUTPUT_FIGURE]
             {pca,t-sne,umap,mds} ...

Sub-commands:

pca

Principal Component Analysis

embed pca [-h] [--components COMPONENTS]
          [--explained-variance EXPLAINED_VARIANCE]

t-sne

t-distributed Stochastic Neighborhood Embedding

embed t-sne [-h] [--perplexity PERPLEXITY] [--learning-rate LEARNING_RATE]

umap

Uniform Manifold Approximation and Projection

embed umap [-h] [--nearest-neighbors NEAREST_NEIGHBORS] [--min-dist MIN_DIST]

mds

Multidimensional Scaling

embed mds [-h] [--components COMPONENTS]

API

src.embed.get_hamming_distances(genomes)

Calculate pairwise Hamming distances between the given list of genomes and return the nonredundant array of values for use with scipy’s squareform function. Bases other than standard nucleotides (A, T, C, G) are ignored.

  • Parameters

    genomes (list) – a list of strings corresponding to genomes that should be compared

  • Returns

    a list of distinct Hamming distances as a vector-form distance vector

  • Return type

    list

>>> genomes = ["ATGCT", "ATGCT", "ACGCT"]
>>> get_hamming_distances(genomes)
[0, 1, 1]
>>> genomes = ["AT-GCT", "AT--CT", "AC--CT"]
>>> get_hamming_distances(genomes)
[0, 1, 1]

Issues and fixes:

Issue/Fix: Used sphinx-book-theme version 0.3.3 for backwards compatibility (wouldn't render otherwise)

Issue: index.rst: Module "src" has no attribute "make_parser" Incorrect argparse :module: or :func: values?

Fix: changed module from src to src.embed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathogen-embed-1.1.1.tar.gz (11.9 kB view hashes)

Uploaded Source

Built Distribution

pathogen_embed-1.1.1-py3-none-any.whl (11.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page