Skip to main content

Map gene ids using UniProt.

Project description

PyPI Build Status codecov

Tool for converting between various gene ids.

Installation

$ pip install gene_map

Usage

$ gene_map --help
Usage: gene_map [OPTIONS]

  Map gene ids between various formats.

Options:
  -i, --input TEXT                If it exists, treated as file with
                                  whitespace-separated gene ids. Otherwise
                                  treated as a gene id itself.  [required]
  --from TEXT                     Source ID type.  [required]
  --to TEXT                       Target ID type.  [required]
  -o, --output FILENAME           CSV-file to save result to.
  --organism [ARATH_3702|CAEEL_6239|CHICK_9031|DANRE_7955|DICDI_44689|DROME_7227|ECOLI_83333|HUMAN_9606|MOUSE_10090|RAT_10116|SCHPO_284812|YEAST_559292]
                                  Organism to convert IDs in.
  --cache-dir DIRECTORY           Folder to store ID-databases in.
  -q, --quiet                     Suppress logging of mapping-statistics.
  --force-download                Force download of mapping-database.
  --help                          Show this message and exit.

Getting started

Commandline usage

Inputs can be either gene ids or files containing whitespace-separated gene ids:

$ cat mygenes.txt
P63244 P08246
P68871
$ gene_map \
    -i P35222 -i InvalidID -i mygenes.txt -i P04637 \
    --from ACC --to Gene_Name \
    -o gene_mapping.csv
Mapped 5/6 genes.
$ cat gene_mapping.csv
ID_from,ID_to
P04637,TP53
P08246,ELANE
P35222,CTNNB1
P63244,RACK1
P68871,HBB

It is also possible to simply try to convert all given inputs without knowing their ID type, by using --from auto:

$ gene_map \
    -i P35222 \
    -i TP53 \
    -i '9606.ENSP00000306407' \
    --from auto \
    --to GeneID
Mapped 3/3 genes.
ID_from,ID_to
9606.ENSP00000306407,79007
P35222,1499
TP53,7157

Attention: if an ID is valid for multiple types, unintended side-effects may occur. Furthermore, all IDs are treated as strings.

API usage

>>> from gene_map import GeneMapper

>>> stringdb_ids = ['9606.ENSP00000306407', '9606.ENSP00000337461']
>>> gm = GeneMapper()  # defaults to HUMAN_9606
>>> gm.query(stringdb_ids, source_id_type='STRING', target_id_type='GeneID')
#                ID_from  ID_to
#0  9606.ENSP00000306407  79007
#1  9606.ENSP00000337461  90529

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gene_map-0.4.4.tar.gz (8.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page