Skip to main content

Extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file.

Project description

gbseqextractor

updates

version 0.0.5:
Merge pull request #4 from gopalpeddinti/patch-1. This was needed to fix the BioPython deprecation warning.

version 20201128:
1. Now we can handle compounlocation (feature location with "join")!
2. We can also output the translation for each CDS.

1 Introduction

gbseqextractor is a tool to extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file. with Biopython (http://www.biopython.org/)

2 Installation

pip install gbseqextractor

There will be a command gbseqextractor created under the same directory as your pip command.

3 Usage

$ gbseqextractor
usage: gbseqextractor.py [-h] -f <STR> -prefix <STR> [-seqPrefix <STR>]
                         [-types {CDS,rRNA,tRNA,wholeseq,gene} [{CDS,rRNA,tRNA,wholeseq,gene} ...]] [-cds_translation]
                         [-gi] [-p] [-t] [-s] [-l] [-rv] [-F]

Extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file.

Seqid will be the value of '/gene=' or '/product=', if they both were not
present, the gene will not be output!

version 20201128:
    Now we can handle compounlocation (feature location with "join")!
    We can also output the translation for each CDS (retrived from '/translation=')

Please cite:
Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu,
MitoZ: a toolkit for animal mitochondrial genome assembly, annotation
and visualization, Nucleic Acids Research, https://doi.org/10.1093/nar/gkz173



optional arguments:
  -h, --help            show this help message and exit
  -f <STR>              Genbank file
  -prefix <STR>         prefix of output file. required.
  -seqPrefix <STR>      prefix of each seq id. default: None
  -types {CDS,rRNA,tRNA,wholeseq,gene} [{CDS,rRNA,tRNA,wholeseq,gene} ...]
                        what kind of genes you want to extract? wholeseq for whole fasta seq. WARNING: Each sequence in the
                        result files corresponds to ONE feature in the GenBank file, I will NOT combine multiple CDS of the
                        same gene into ONE! [CDS]
  -cds_translation      Also output translated CDS (required -types CDS). The translations are retrived directly from the
                        '/translation=' key word. [False]
  -gi                   use gi number as sequence ID instead of accession number when " gi number is present. (default:
                        accession number)
  -p                    output the position information on the ID line. Warning: the position on ID line is 0 left-most!
                        [False]
  -t                    output the taxonomy lineage on ID line [False]
  -s                    output the species name on the ID line [False]
  -l                    output the seq length on the ID line [False]
  -rv                   reverse and complement the sequences if the gene is on minus strand. Always True!
  -F                    only output full length genes,i.e., exclude the genes with '>' or '<' in their location [False]

Author

Guanliang MENG

Citation

This script is part of the package MitoZ, when you use the script in your work, please cite:

Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu,
MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization, Nucleic Acids Research, https://doi.org/10.1093/nar/gkz173

Meanwhile, since gbseqextractor makes use of Biopython, you should alos cite it if you use gbseqextractor in your work:

Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon: “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 25 (11), 1422–1423 (2009). https://doi.org/10.1093/bioinformatics/btp163

Please go to http://www.biopython.org/ for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gbseqextractor-0.0.5.tar.gz (17.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page