Resource for working with microhaploptype data from the ALFRED database (https://alfred.med.yale.edu).

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Environment
- Console
Framework
- IPython
- Jupyter
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

MicroHapDB

Daniel Standage, 2018
https://github.com/bioforensics/microhapdb

MicroHapDB is a package designed for scientists and researchers interested in microhaplotype analysis. This package is a distribution and convenience mechanism and does not implement any analytics itself. MicroHapDB is designed to work with microhap data from any source, although currently all data was obtained from the Allele Frequency Database (ALFRED)^[1] at the Yale University School of Medicine.

Installation

To install:

pip3 install microhapdb

To make sure the package installed correctly:

pip3 install pytest
pytest --pyargs microhapdb --doctest-modules

MicroHapDB requires Python version 3.

Usage

MicroHapDB provides several convenient methods to access microhaplotype data.

a command-line interface
a Python API
a collection of tab-delimited text files

Command-line interface

Invoke microhapdb --help for a description of the command-line configuration options and several usage examples.

Python API

Programmatic access to microhap data within Python is as simple as invoking import microhapdb and querying the following tables.

microhapdb.frequencies
microhapdb.loci
microhapdb.populations
microhapdb.variants

Each is a Pandas^[2] dataframe object, supporting convenient and efficient listing, subsetting, and query capabilities. There are also two auxiliary tables: one that contains a mapping of all variants to their corresponding microhap loci, and another table cross-referencing external IDs/labels/names with internal MicroHapDB identifiers.

microhapdb.variantmap
microhapdb.idmap

The helper function microhapdb.id_xref is also useful for retrieving data using any valid identifiers. The following example demonstrates how data across the different tables can be cross-referenced.

>>> import microhapdb
>>> microhapdb.id_xref('mh02KK-136')
              ID Reference Chrom      Start        End  Source
182  MHDBL000183    GRCh38  chr2  227227673  227227743  ALFRED
>>> pops = microhapdb.populations.query('Name.str.contains("Amer")')
>>> pops
             ID                Name  Source
40  MHDBP000041   African Americans  ALFRED
67  MHDBP000068   African Americans  ALFRED
91  MHDBP000092  European Americans  ALFRED
>>> f = microhapdb.frequencies
>>> f[(f.Locus == "MHDBL000183") & (f.Population.isin(pops.ID))]
             Locus   Population Allele  Frequency
75117  MHDBL000183  MHDBP000041  G,T,C      0.172
75118  MHDBL000183  MHDBP000041  G,T,A      0.103
75119  MHDBL000183  MHDBP000041  G,C,C      0.029
75120  MHDBL000183  MHDBP000041  G,C,A      0.000
75121  MHDBL000183  MHDBP000041  T,T,C      0.293
75122  MHDBL000183  MHDBP000041  T,T,A      0.063
75123  MHDBL000183  MHDBP000041  T,C,C      0.132
75124  MHDBL000183  MHDBP000041  T,C,A      0.207
75333  MHDBL000183  MHDBP000068  G,T,C      0.156
75334  MHDBL000183  MHDBP000068  G,T,A      0.148
75335  MHDBL000183  MHDBP000068  G,C,C      0.016
75336  MHDBL000183  MHDBP000068  G,C,A      0.000
75337  MHDBL000183  MHDBP000068  T,T,C      0.336
75338  MHDBL000183  MHDBP000068  T,T,A      0.049
75339  MHDBL000183  MHDBP000068  T,C,C      0.156
75340  MHDBL000183  MHDBP000068  T,C,A      0.139
75525  MHDBL000183  MHDBP000092  G,T,C      0.384
75526  MHDBL000183  MHDBP000092  G,T,A      0.202
75527  MHDBL000183  MHDBP000092  G,C,C      0.000
75528  MHDBL000183  MHDBP000092  G,C,A      0.000
75529  MHDBL000183  MHDBP000092  T,T,C      0.197
75530  MHDBL000183  MHDBP000092  T,T,A      0.000
75531  MHDBL000183  MHDBP000092  T,C,C      0.071
75532  MHDBL000183  MHDBP000092  T,C,A      0.146

See the Pandas documentation for more details on dataframe access and query methods.

Tab-delimited text files

The data behind MicroHapDB is contained in 6 tab-delimited text files. If you'd prefer not to use MicroHapDB's command-line interface or Python API, it should be trivial load these files directly into R, Julia, or the data science environment of your choice. Invoke microhapdb files on the command line to see the location of the installed .tsv files.

locus.tsv: microhaplotype loci
variant.tsv: variants associated with each microhap locus
allele.tsv: allele frequencies for 148 loci across 84 populations
population.tsv: summary of the populations studied
variantmap.tsv: shows which variants are associated with which loci
idmap.tsv: mapping of all IDs/names/labels to internal MicroHapDB IDs

Citation

If you use this package, please cite our work.

Standage DS (2018) MicroHapDB: programmatic access to published microhaplotype data. GitHub repository, https://github.com/bioforensics/microhapdb.

^[1]Rajeevan H, Soundararajan U, Kidd JR, Pakstis AJ, Kidd KK (2012) ALFRED: an allele frequency resource for research and teaching. Nucleic Acids Research, 40(D1): D1010-D1015. doi:10.1093/nar/gkr924.

^[2]McKinney W (2010) Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 51-56.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Environment
- Console
Framework
- IPython
- Jupyter
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.11

Oct 26, 2023

0.10.1

Oct 13, 2023

0.10

Oct 12, 2023

0.9

Jul 6, 2023

0.7

Jan 20, 2022

0.7rc1 pre-release

Jan 20, 2022

0.4.3

Nov 6, 2019

0.4.1

Oct 23, 2019

0.4

Oct 23, 2019

This version

0.2

Dec 7, 2018

0.1.2

Oct 10, 2018

0.1.1

Oct 10, 2018

0.1.0

Oct 10, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

microhapdb-0.2.tar.gz (917.4 kB view hashes)

Uploaded Dec 7, 2018 Source

Hashes for microhapdb-0.2.tar.gz

Hashes for microhapdb-0.2.tar.gz
Algorithm	Hash digest
SHA256	`5d65dfc9eb53079076da7efadc9d0f5d32ffd3df88674c1bad70b36ca3b8d9e8`
MD5	`2f9d3c7790cc027c904fea82711ecbb0`
BLAKE2b-256	`799342989a5c94cff6e2be24185e240319ad7bd1402858d944efa81e52ae3e5a`