Download genome files from the NCBI FTP server.
Project description
So this is a set of scripts that focuses on the actual genome downloading.
Installation
pip install ncbi-genome-download
Alternatively, clone this repository from GitHub, then run (in a python virtual environment)
pip install .
If this fails on older versions of Python, try updating your pip tool first:
pip install --upgrade pip
and then rerun the ncbi-genome-download install.
Usage
To download all bacterial RefSeq genomes in GenBank format from NCBI, run the following:
ncbi-genome-download bacteria
Downloading multiple groups is also possible:
ncbi-genome-download bacteria,viral
If you’re on a reasonably fast connection, you might want to try running multiple downloads in parallel:
ncbi-genome-download bacteria --parallel 4
To download all fungal GenBank genomes from NCBI in GenBank format, run:
ncbi-genome-download --section genbank fungi
To download all viral RefSeq genomes in FASTA format, run:
ncbi-genome-download --format fasta viral
It is possible to download multiple formats by supplying a list of formats or simply download all formats:
ncbi-genome-download --format fasta,assembly-report viral ncbi-genome-download --format all viral
To download only completed bacterial RefSeq genomes in GenBank format, run:
ncbi-genome-download --assembly-level complete bacteria
To download only bacterial reference genomes from RefSeq in GenBank format, run:
ncbi-genome-download --refseq-category reference bacteria
To download bacterial RefSeq genomes of the genus Streptomyces, run:
ncbi-genome-download --genus Streptomyces bacteria
Note: This is a simple string match on the organism name provided by NCBI only.
You can also use this with a slight trick to download genomes of a certain species as well:
ncbi-genome-download --genus "Streptomyces coelicolor" bacteria
Multiple genera is also possible:
ncbi-genome-download --genus "Streptomyces coelicolor,Escherichia coli" bacteria
To download bacterial RefSeq genomes based on their NCBI species taxonomy ID, run:
ncbi-genome-download --species-taxid 562 bacteria
Note: The above command will download all RefSeq genomes belonging to Escherichia coli.
To download a specific bacterial RefSeq genomes based on its NCBI taxonomy ID, run:
ncbi-genome-download --taxid 511145 bacteria
Note: The above command will download the RefSeq genome belonging to Escherichia coli str. K-12 substr. MG1655.
It is also possible to download multiple species taxids or taxids by supplying the numbers in a comma-separated list:
ncbi-genome-download --taxid 9606,9685 --assembly-level chromosome vertebrate_mammalian
Note: The above command will download the reference genomes for cat and human.
ncbi-genome-download --human-readable bacteria
To get an overview of all options, run
ncbi-genome-download --help
As a method
import ncbi_genome_download as ngd ngd.download()
Note: To specify a taxonomic group, like bacteria, use the group keyword.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ncbi-genome-download-0.2.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a532435ea5f68f28ef7e21ca75d349edf5eac3f25ca44378cd311338c8b79c5 |
|
MD5 | 7cab4ad236e514c464c14c7d94b9afc5 |
|
BLAKE2b-256 | e5e391cf5b01e3ef093768ffe3445a08e358f7c4438d3a5d395e40c5e6cea3b2 |
Hashes for ncbi_genome_download-0.2.6-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0c6a1db0955f1d4f8501a5056f3650d3a571368bfbb161ce2ee7fdcf9dbe0e15 |
|
MD5 | 19210d0a2da4ae3589c8af5d4c0abb6c |
|
BLAKE2b-256 | 4fb74ee9eac5be4405a33860d71fd668f7a1b88f1c5e24a85bd8297fbd909565 |