Skip to main content

MMSEQS Python bindings

Project description

MMseqs2 bindings for Python

This project provides bidings for mmseqs. It's still work in progress. This is the base usage scenario:

import mmseqs

#
# Demonstration of basic mmseqs2 operations
#

# Create a client
client = mmseqs.MMSeqs()

# Create a database from fasta file
# Here we specify name of the database, description and input file
# (The input can also be a Seq/SeqRecord list/iterator/etc.)
client.databases.create("test", "Test database", "example/a.fasta")

# Get description of the database
print(client.databases[0].description)

# Perform search on a database
# Note that the search queries can be a string with a patch to the FASTA file with queries
results = client.databases[0].search(
    [
        "ACTAGCTCAGTCAACTAGCTCAGTCCTCAGTCAACTAGCTCAGTCTATATATATACAAC",
        "ACTAGCTCAGTCAACTAGCTCAGTCCTCAGTCAACTAGCT",
        "ACTAGCTCAGTCAACTAGCT",
        "ACTAGCTCAGT",
    ],
    search_type="nucleotides",
)

# Load queries from file:
# results = client.databases[0].search_file("input.fasta", search_type="nucleotides")

# You can pass list of headers to get:
#   query_sequence_id
#   target_sequence_id
#   query_sequence_content
#   target_sequence_content
#   sequence_identity
#   alignment_length
#   number_of_mismatches
#   number_of_gap_openings
#   domain_start_index_query
#   domain_end_index_query
#   domain_start_index_target
#   domain_end_index_target
#   e_value
#   bit_score
# For example:
# results2 = client.databases[0].search(
#     [
#         "ACTAGCTCAGTCAACTAGCTCAGTCCTCAGTCAACTAGCTCAGTCTATATATATACAAC",
#         "ACTAGCTCAGTCAACTAGCTCAGTCCTCAGTCAACTAGCT",
#         "ACTAGCTCAGTCAACTAGCT",
#         "ACTAGCTCAGT",
#     ],
#     search_type="nucleotides",
#     headers=["query_sequence_id", "target_sequence_id", "sequence_identity", "alignment_length", "number_of_mismatches"]
# )

# results.records is a list of lists. Each item contains alignments for each query.
# Each list of alignments consists of single result
# print(results.records)

# You can also get a pandas dataframe
print(results.dataframe)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mmseqs-1.0.0.tar.gz (4.3 MB view hashes)

Uploaded Source

Built Distributions

mmseqs-1.0.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (36.8 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

mmseqs-1.0.0-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (35.8 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

mmseqs-1.0.0-cp39-cp39-macosx_10_15_x86_64.whl (11.0 MB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

mmseqs-1.0.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (36.8 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

mmseqs-1.0.0-cp38-cp38-manylinux2014_x86_64.whl (33.7 MB view hashes)

Uploaded CPython 3.8

mmseqs-1.0.0-cp38-cp38-manylinux2010_i686.manylinux_2_12_i686.whl (35.8 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

mmseqs-1.0.0-cp38-cp38-macosx_10_15_x86_64.whl (11.0 MB view hashes)

Uploaded CPython 3.8 macOS 10.15+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page