Skip to main content

Lightweight High level Python 3 API for NCBI BLAST

Project description

Blastpy3

PyPI version Downloads Anaconda Version Anaconda Downloads License Language


Lightweight High level Python 3 API for NCBI BLAST+ blastn


Blastn

This class contain the wrapper for Blastn and require the installation of ncbi Blast+ 2.2.28+.

Setup Blastn object: Create subject database

Upon instantiation, a database is created from the user-provided subject sequence. Database files are created in a temporary directory. The following parameters can be customized at Blastn objects instantiation

  • ref_path: Path to the reference fasta file (not gzipped). Mandatory
  • makeblastdb_exec: Path of the makeblastdb executable. Default = "makeblastdb"
  • makeblastdb_opt: makeblastdb command line options as a string. Default = ""

To ensure a proper database files deletion at the end of the execution it is possible to call the object using the with statement. Alternatively you can call the rm_db method at the end of the Blastn usage.

Code

with Blastn(ref_path="./subject.fa") as blastn:
    print (blastn)

Output

CREATE DATABASE: makeblastdb  -dbtype nucl -input_type fasta -in subject.fa -out temp_dir

MAKEBLASTDB CLASS	Parameters list
	db_dir	/tmp/tmplbkdwzm2
	db_path	/tmp/tmplbkdwzm2/Yeast
	makeblastdb_exec	makeblastdb
	makeblastdb_opt
	ref_path	./data/Yeast.fa
	verbose	False

Cleaning up blast DB files for "subject"

Calling Blastn object: Perform Blastn and return a list of hits

The "align" method of a Blastn object can then be called with a query fasta file (query_path) or directly with a sequence string (query_seq).. The following parameters can be customized at Blastn objects calling:

  • query_path: Path to a fasta file containing the query sequences (not gzipped). Mandatory
  • query_seq: sequence string
  • blast_exec: Path of the blast executable. By Default blastn will be used. Default = "blastn"
  • blastn_opt: Blastn command line options as a string. Default = ""
  • task: Type of blast to be performed ('blastn' 'blastn-short' 'dc-megablast' 'megablast' 'rmblastn'). Default = "dc-megablast"
  • evalue: E Value cuttoff to retain alignments. Default = 1
  • best_query_hit: find and return only the best hit per query. Default = False

A list containing 1 BlastHit object for each query hit found in the subject will be returned, except if not hit were found in which situation 'None' will be returned. If the best_query_hit flag was set to True, Only the best hit per query sequence from the query file will be returned.

Code

with Blastn(ref_path="./subject.fa") as blastn:
    hit_list = blastn(query_path="./query.fa")
    for hit in hit_list:
        print (hit)

Output

CREATE DATABASE: makeblastdb  -dbtype nucl -input_type fasta -in ./subject.fa -out /tmp/tmp1ZBlfT/subject

MAKE BLAST: blastn  -num_threads 4 -task dc-megablast -evalue 1 -outfmt "6 std qseq" -dust no -query ./query.fa -db /tmp/tmp1ZBlfT/subject

	2 hits found
HIT 0	Query	query1:0-48(+)
	Subject	subject:19-67(+)
	Lenght : 48	Identity : 100.0%	Evalue : 2e-23	Bit score : 87.8
	Aligned query seq : GCATGCTCGATCAGTAGCTCTCAGTACGCATACGCTAGCATCACGACT

HIT 1	Query	query2:0-48(+)
	Subject	subject:89-137(+)
	Lenght : 48	Identity : 100.0%	Evalue : 2e-23	Bit score : 87.8
	Aligned query seq : CGCATCGACTCGATCTGATCAGCTCACAGTCAGCATCAGCTACGATCA

Cleaning up blast DB files for "subject"

BlastHit

Python object representing a hit found by blastn. The object contains the following public fields:

  • id: Auto incremented unique identifier [INT]
  • q_id: Query sequence name [STR]
  • s_id: Subject sequence name [STR]
  • identity: % of identity in the hit [FLOAT 0:100]
  • length: length of the hit [INT >=0]
  • mis: Number of mismatch in the hit [INT >=0]
  • gap: Number of gap in the hit [INT >=0]
  • q_start: Hit start position of the query sequence [INT >=0]
  • q_end: Hit end position of the query sequence [INT >=0]
  • s_start: Hit start position of the subject sequence [INT >=0]
  • s_end: Hit end position of the subject sequence [INT >=0]
  • evalue: E value of the alignment [FLOAT >=0]
  • bscore: Bit score of the alignment[FLOAT >=0]
  • q_seq: Sequence of the query aligned on the subject sequence [STR]
  • q_orient: Orientation of the query sequence [+ or -]
  • s_orient: Orientation of the subject sequence [+ or -]

The validity of numeric value is checked upon instantiation. Invalid values will raise assertion errors.

BlastHit Objects can return a comprehensive report of themselves under the form of an ordered dictionnary:

code

# Interactive import
from BlastHit import BlastHit

# Create a default BlastHit object
h = BlastHit()

# Call the report method
h.get_report(full = True)

Output

OrderedDict([('Query', 'query:0-10(+)'), ('Subject', 'subject:0-10(+)'), ('Identity', 100.0), ('Evalue', 0.0), ('Bit Score', 0.0), ('Hit length', 10), ('Number of gap', 0), ('Number of mismatch', 0)])

Testing pyBlast module

The module can be easily tested thanks to pytest

  • Install pytest with pip pip instal pytest
  • Run test with py.test-2.7 -v

Example of output if successful. Please note than some tests might fail due to the random sampling of DNA sequences, and uncertainties of Blastn algorithm.

========================================== test session starts ===========================================
platform linux2 -- Python 2.7.5 -- py-1.4.27 -- pytest-2.7.0 -- /usr/bin/python
rootdir: /home/adrien/Programming/Python/pyBlast, inifile:
collected 21 items

test_pyBlast.py::test_BlastHit[4.16866907958-57-98-69-88-12-100-43-1.40452897105-47.3666242716] PASSED
test_pyBlast.py::test_BlastHit[-1-7-10-20-73-54-25-45-98.7921480151-45.2397166228] xfail
test_pyBlast.py::test_BlastHit[8.92741377413--1-100-36-34-33-14-71-18.8547135761-97.6604693294] xfail
test_pyBlast.py::test_BlastHit[10.5987790458-46--1-45-78-81-86-86-73.8740266727-56.887410005] xfail
test_pyBlast.py::test_BlastHit[66.8213911219-62-48--1-91-10-60-20-88.7850139735-81.7901609219] xfail
test_pyBlast.py::test_BlastHit[86.6626174287-29-83-34--1-53-57-68-17.9799756069-7.83036609495] xfail
test_pyBlast.py::test_BlastHit[5.23985331666-43-85-33-7--1-14-3-74.2130782704-88.9289495285] xfail
test_pyBlast.py::test_BlastHit[75.6935977321-8-78-68-10-39--1-74-44.1447867052-22.5203082483] xfail
test_pyBlast.py::test_BlastHit[39.8692596061-60-5-49-77-9-31--1-2.59963139531-46.3133849683] xfail
test_pyBlast.py::test_BlastHit[15.7192632366-24-92-1-64-82-83-90--1-75.5540618409] xfail
test_pyBlast.py::test_BlastHit[18.6627439886-34-57-60-5-45-26-40-77.7840842678--1] xfail
test_pyBlast.py::test_Blastn[blastn-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[blastn-Random queries] xfail
test_pyBlast.py::test_Blastn[blastn-short-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[blastn-short-Random queries] xfail
test_pyBlast.py::test_Blastn[dc-megablast-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[dc-megablast-Random queries] xfail
test_pyBlast.py::test_Blastn[megablast-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[megablast-Random queries] xfail
test_pyBlast.py::test_Blastn[rmblastn-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[rmblastn-Random queries] xfail

================================== 6 passed, 15 xfailed in 5.91 seconds ==================================

Dependencies

Authors and Contact

Adrien Leger - 2015

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blastpy3-0.3.0.tar.gz (9.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page