A comprehensive library for computational molecular biology
Project description
Biotite project
Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:
Searching and fetching data from biological databases
Reading and writing popular sequence/structure file formats
Analyzing and editing sequence/structure data
Visualizing sequence/structure data
Interfacing external applications for further analysis
Biotite internally stores most of the data as NumPy ndarray objects, enabling
fast C-accelerated analysis,
intuitive usability through NumPy-like indexing syntax,
extensibility through direct access of the internal NumPy arrays.
As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.
If you use Biotite in a scientific publication, please cite:
Installation
Biotite requires the following packages:
numpy
requests
msgpack
networkx
Some functions require some extra packages:
mdtraj - Required for trajetory file I/O operations.
matplotlib - Required for plotting purposes.
Biotite can be installed via Conda…
$ conda install -c conda-forge biotite
… or pip
$ pip install biotite
Usage
Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
db_name="protein", ret_type="fasta"
)
# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])
MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA
TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT
DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ
More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.
Contribution
Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biotite-0.34.1-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1be35b55a3ac0aa5bb61042e4e070da3775358a11682aca4af5e5c60695e7583 |
|
MD5 | 332b20f8c5305441915bc95884953def |
|
BLAKE2b-256 | 8c4f02d844435eb3e9e39f31e9b992958f437bac45879612110bfce331d39df1 |
Hashes for biotite-0.34.1-cp310-cp310-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0051ebcd19f5a0c4e5b899edfa7ddcca9c8cc2528a31843d0c735d4753b034be |
|
MD5 | 2c57a228de32abcf043020ff396a985a |
|
BLAKE2b-256 | ac8586a7e209fb40cd64d979d3a97c6906a2443952b6bda53989123ad8e28682 |
Hashes for biotite-0.34.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 069e8b328d6812b24f9ada61024cd334d5107e77a86d912bf8829e2ee1d62316 |
|
MD5 | b0471e065478d7560500322aba223f22 |
|
BLAKE2b-256 | 992abedfd7426fe45c45735e0828a4329942500e875bf5f9d4da30e24385d88c |
Hashes for biotite-0.34.1-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c9351c4d02b0342cfd05b4b7fba49cf564fe7d7bd08ab8183167b9d12c0c47f |
|
MD5 | ad713c94430d38c6ce923d8d159e3a74 |
|
BLAKE2b-256 | bc718932e1669571a4d64a3ec49d1b94eeb3093acc32f71076dbcc924ed32e6e |
Hashes for biotite-0.34.1-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a371fae6addf55e8465dd236a3659843c8bb1f00035fe9f028ad9a5d40a359b |
|
MD5 | 7a4728e038cba668dabd5e8c01ddc1bc |
|
BLAKE2b-256 | 56485a35d3e4720619e29a361d10db26bd7f65a210ff2053aaae405069fb17db |
Hashes for biotite-0.34.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c3a3cca52820abf9625fa294d6b2746ebe88c6ab8bd001c0958fe3783487e22 |
|
MD5 | 12aa6e653212257ba8df03e8b9d4b6db |
|
BLAKE2b-256 | 203c048485a479fb44af9eb0efca8dd63a0a1ff8e5a1cea1c0435f062aacf65a |
Hashes for biotite-0.34.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3cae1db5c9011459055fbe8cf41dcc03fcd9e05225d29f6e3bbf7b2333780146 |
|
MD5 | ea7aa9b6bbe8a78e5a34930e7bdf617c |
|
BLAKE2b-256 | 2a3a62771209c81b5752a7d6d780abc977ae488f115343e87cd621241bcb74ae |
Hashes for biotite-0.34.1-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67e8a25fb0ab1b76d4ddc0d895d2177f0976a779337980d3af33e569e7a7c502 |
|
MD5 | e29b6f2b1787ae7fa7d91a1c8dc06d56 |
|
BLAKE2b-256 | 9de8cdf855418f35b2a7fb84e2acfe8b60727f6b8b203490f967c93f9ab39060 |
Hashes for biotite-0.34.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60ae0fde18f1d2648caa20bda7d4574d59278d9ef777e93bf9f198cb418c568a |
|
MD5 | 4a296884db939e2f185f5d37fd70aede |
|
BLAKE2b-256 | 9cb5e003d4970501301281f14ba94544985c5f6db622bd7252945dc99007617b |