A comprehensive library for computational molecular biology
Project description
Biotite project
Biotite is your Swiss army knife for bioinformatics. Whether you want to identify homologous sequence regions in a protein family or you would like to find disulfide bonds in a protein structure: Biotite has the right tool for you. This package bundles popular tasks in computational molecular biology into a uniform Python library. It can handle a major part of the typical workflow for sequence and biomolecular structure data:
Searching and fetching data from biological databases
Reading and writing popular sequence/structure file formats
Analyzing and editing sequence/structure data
Visualizing sequence/structure data
Interfacing external applications for further analysis
Biotite internally stores most of the data as NumPy ndarray objects, enabling
fast C-accelerated analysis,
intuitive usability through NumPy-like indexing syntax,
extensibility through direct access of the internal NumPy arrays.
As a result the user can skip writing code for basic functionality (like file parsers) and can focus on what their code makes unique - from small analysis scripts to entire bioinformatics software packages.
If you use Biotite in a scientific publication, please cite:
Installation
Biotite requires the following packages:
numpy
requests
msgpack
networkx
Some functions require some extra packages:
mdtraj - Required for trajetory file I/O operations.
matplotlib - Required for plotting purposes.
Biotite can be installed via Conda…
$ conda install -c conda-forge biotite
… or pip
$ pip install biotite
Usage
Here is a small example that downloads two protein sequences from the NCBI Entrez database and aligns them:
import biotite.sequence.align as align
import biotite.sequence.io.fasta as fasta
import biotite.database.entrez as entrez
# Download FASTA file for the sequences of avidin and streptavidin
file_name = entrez.fetch_single_file(
uids=["CAC34569", "ACL82594"], file_name="sequences.fasta",
db_name="protein", ret_type="fasta"
)
# Parse the downloaded FASTA file
# and create 'ProteinSequence' objects from it
fasta_file = fasta.FastaFile.read(file_name)
avidin_seq, streptavidin_seq = fasta.get_sequences(fasta_file).values()
# Align sequences using the BLOSUM62 matrix with affine gap penalty
matrix = align.SubstitutionMatrix.std_protein_matrix()
alignments = align.align_optimal(
avidin_seq, streptavidin_seq, matrix,
gap_penalty=(-10, -1), terminal_penalty=False
)
print(alignments[0])
MVHATSPLLLLLLLSLALVAPGLSAR------KCSLTGKWDNDLGSNMTIGAVNSKGEFTGTYTTAV-TA
-------------------DPSKESKAQAAVAEAGITGTWYNQLGSTFIVTA-NPDGSLTGTYESAVGNA
TSNEIKESPLHGTQNTINKRTQPTFGFTVNWKFS----ESTTVFTGQCFIDRNGKEV-LKTMWLLRSSVN
ESRYVLTGRYDSTPATDGSGT--ALGWTVAWKNNYRNAHSATTWSGQYV---GGAEARINTQWLLTSGTT
DIGDDWKATRVGINIFTRLRTQKE---------------------
-AANAWKSTLVGHDTFTKVKPSAASIDAAKKAGVNNGNPLDAVQQ
More documentation, including a tutorial, an example gallery and the API reference is available at https://www.biotite-python.org/.
Contribution
Interested in improving Biotite? Have a look at the contribution guidelines. Feel free to join or community chat on Discord.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for biotite-0.32.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3df13f309de5cf94b6f2c86afbc74670dc0020f826b42ce5f14115aa0d4feba5 |
|
MD5 | 92d093bff27b5590267d0be1527ef6a9 |
|
BLAKE2b-256 | 68859a14bc5d8e4675223757d65025830b11447146a57498021aee5079a5cb27 |
Hashes for biotite-0.32.0-cp310-cp310-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc0ef92a7ee2a1e5f36affacb752323adf91e254fed0cbb1e9c48966adbd8b25 |
|
MD5 | 18e64daee02ddf445fdd23cbead2943c |
|
BLAKE2b-256 | c68fe7805cc2dfff797ab7b28f0ba1f82d34d31213f94037d220b61ac7ef7f74 |
Hashes for biotite-0.32.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f670d5757376138b0b1425f43d0ccc7752b896b4c43013003f45a7d44d9596d4 |
|
MD5 | 36fa4fff51ba0bab56b137f71d67362b |
|
BLAKE2b-256 | 337055526a5fb3a683a1f255ce630a2ebd4d6ad555e3d61bb49c207e870d87b2 |
Hashes for biotite-0.32.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f01b9b24674e609b01ca084cbf06a5989e2fed698b0f134c6c8abfcaf70a86f |
|
MD5 | 8afb497edc81bc72af01a833cb5cf7cf |
|
BLAKE2b-256 | 7fb4b1c641f689f0188dd5d904b702e16f90dd3c9a39668909c459be37b000b6 |
Hashes for biotite-0.32.0-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 806609f8ec9dabd0f0067aa0734e117598cc329975f1a7596ddb159450304bb8 |
|
MD5 | cdcac6b95685abe1fb6c8002cf106bca |
|
BLAKE2b-256 | 0bb6bf80aed57c386dc2a61033f7bfe0cc0155fbcf7fd2cbacfde8a61029766e |
Hashes for biotite-0.32.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6fffc7982dd0043047c9a23123e8ceee7c478e35bdedf2481e4150cdf8f1499e |
|
MD5 | c2375fcaad65656d7e88ef9c2933fa8e |
|
BLAKE2b-256 | 27cd690f18885402e7ed69bf524fcdd9be03f3e8190f4fc78ade7bd6351db68b |
Hashes for biotite-0.32.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f982761d7f05dc5874f22067632a10cce638ca4c31a162e3f85a1d2e77e237c |
|
MD5 | bac2cc1fc1508aceff1ca45b2ee32bf7 |
|
BLAKE2b-256 | 28b395936c7f38015373371f9e112f55471d34426f72858e411e616092a2976e |
Hashes for biotite-0.32.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69fa9901b0c26a4dd49c51d48a5918fcebdc5e95d813160daa9e57aeea2b9c3e |
|
MD5 | 6d617806f8c20425e9a6a9e9d0cb4757 |
|
BLAKE2b-256 | f45bcea5cd633b13947a2e43ac21a9c6de7441464685807599760f2d1c24ad95 |
Hashes for biotite-0.32.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ac329571102d47f86091fb0d8902c795db22e77d806b3bd3a5e8d8133556f43 |
|
MD5 | c4b4309e5a2039a5df9a490406c5dcce |
|
BLAKE2b-256 | 6a4c5baeb14edbea9973bafb0a7e1d67b45d318981fe4a9da9d4205429d10c93 |