Skip to main content

Python bindings for the GuppyClient library.

Project description

Important notice

ont-pyguppy-client-lib has been renamed to ont-pybasecall-client-lib. This project will no longer be updated but is provided in order to support basecall servers up to version 7.2.x. For python bindings to dorado_basecall_server 7.3.0 onwards, please use ont-pybasecall-client-lib.

ont-pyguppy-client-lib

ont-pyguppy-client-lib provides python bindings for connecting to a Dorado basecall server. It allows you to interact with the server to do anything you could normally do using the ont_basecall_client. This includes:

  • Basecalling

  • Barcoding / demultiplexing

  • Alignment

For example:

>>> from pyguppy_client_lib.pyclient import PyGuppyClient
>>> client = PyGuppyClient(
    "127.0.0.1:5555",
    "dna_r9.4.1_450bps_fast",
    align_ref="/path/to/index.mmi",
    bed_file="/path/to/targets.bed"
)
>>> client.connect()

Getting started

ont-pyguppy-client-lib is available on PyPI and may be installed via pip:

pip install ont-pyguppy-client-lib

ont-pyguppy-client-lib requires an instance of the Dorado basecall server is running. ont-dorado-server may be obtained from the Oxford Nanopore Community

The version of ont-pyguppy-client-lib should exactly match the version of ont-dorado-server being used. You can find your ont-dorado-server version like this:

$ <location of dorado_basecall_server>/dorado_basecall_server --version

For example, this Dorado basecall server is version 7.1.1:

$ ./ont-dorado-server/bin/dorado_basecall_server --version
: Dorado Basecall Service Software, (C) Oxford Nanopore Technologies,  Limited. Version 7.1.1+effbaf8, client-server API version 16.0.0

Install a specific version of ont-pyguppy-client-lib like this:

pip install ont-pyguppy-client-lib==<version>

Dependencies

ont-pyguppy-client-lib requires numpy in order to run. In order to use included helper functions for reading data from fast5 and/or pod5 files it is also necessary to manually install ont-fast5-api and/or pod5:

pip install ont-fast5-api pod5

Documentation and help

Information on the methods available may be viewed through Python’s help command::

>>> from pyguppy_client_lib import pyclient
>>> help(pyclient)
>>> from pyguppy_client_lib import client_lib
>>> help(client_lib)

Interface / Examples

ont-pyguppy-client-lib comprises three Python modules:

  1. pyclient A user-friendly wrapper around client_lib. This is what you should use to interact with a Dorado basecall server.

  2. client_lib A compiled library which provides direct Python bindings to Dorado’s C++ GuppyClient API.

  3. helper_functions A set of functions for running a Dorado basecall server and loading reads from fast5 and/or pod5 files.

Starting a basecall server

There must be a Dorado basecall server running in order to communicate with it. On most Oxford Nanopore devices a basecall server is always running on port 5555. On other devices, or if you want to run a separate basecall server, you must start one yourself:

from pyguppy_client_lib import helper_functions

# A basecall server requires:
#  * A location to put log files (on your PC)
#  * An initial config file to load
#  * A port to run on
server_args = ["--log_path", "/home/myuser/guppy_server_logs",
               "--config", "dna_r9.4.1_450bps_fast.cfg",
               "--port", 5556]
# The second argument is the directory where the
# dorado_basecall_server executable is found. Update this as
# appropriate.
helper_functions.run_server(server_args, "/home/myuser/ont-dorado/bin")

See the the DOCUMENTATION.md file in the ont-dorado-server archive for more information on server arguments.

Basecall and align using PyGuppyClient

from pyguppy_client_lib.pyclient import PyGuppyClient

client = PyGuppyClient(
    "127.0.0.1:5555",
    "dna_r9.4.1_450bps_fast",
    align_ref = "/path/to/align_ref.fasta",
    bed_file = "/path/to/bed_file.bed"
)
client.connect()

Note that the helper_functions module requires that ont-fast5-api and/or pod5 is installed.:

from pyguppy_client_lib.helper_functions import basecall_with_pyguppy

# Using the client generated in the previous example
called_reads = basecall_with_pyguppy(
    caller,
    "/path/to/input_folder"
)

for read in called_reads:
    read_id = read['metadata']['read_id']
    alignment_genome = read['metadata']['alignment_genome']
    sequence = read['datasets']['sequence']
    print(f"{read_id} sequence length is {len(sequence)}"
          f"alignment_genome is {alignment_genome}")

Basecall and get states, moves and modbases using GuppyClient

In order to retrieve the movement dataset, the move_and_trace_enabled option must be set to True; analogously, for the state_data one, post_out must be turned on. NOTE: You shouldn’t turn on post_out if you don’t need the states, because it generates a LOT of extra output data so it can really hurt performance. Likewise with move_and_trace_enabled, although that’s much less expensive.

options = {'priority': GuppyClient.high_priority,
          'client_name': "test_client",
          'move_and_trace_enabled': True,
          'post_out':True }

client = GuppyClient(port_path, 'dna_r9.4.1_e8.1_modbases_5mc_cg_fast')
result = client.set_params(options)
result = client.connect()

called_reads = basecall_with_pyguppy(client, input_path)

for read in called_reads:
    base_mod_context = read['metadata']['base_mod_context']
    base_mod_alphabet = read['metadata']['base_mod_alphabet']

    sequence = read['datasets']['sequence']
    movement = read['datasets']['movement']
    state_data = read['datasets']['state_data']
    base_mod_probs = read['datasets']['base_mod_probs']

    print(f"{read_id} sequence length is {len(sequence)}, "
          f"base_mod_context is {base_mod_context}, base_mod_alphabet is {base_mod_alphabet}, "
          f"movement size is {movement.shape}, state_data size is {state_data.shape}, "
          f"base_mod_probs size is {base_mod_probs.shape}")

Glossary of Terms:

Dorado - Oxford Nanopore Technologies’ production basecaller, which translates electrical signals measured from nanopores into DNA or RNA bases.

Fast5 - an implementation of the HDF5 file format, with specific data schemas for Oxford Nanopore Technologies sequencing data.

Pod5 - a file format for storing nanopore dna data in an easily accessible way.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

ont_pyguppy_client_lib-7.2.15-cp311-cp311-win_amd64.whl (1.6 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

ont_pyguppy_client_lib-7.2.15-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

ont_pyguppy_client_lib-7.2.15-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

ont_pyguppy_client_lib-7.2.15-cp311-cp311-macosx_12_0_arm64.whl (3.8 MB view hashes)

Uploaded CPython 3.11 macOS 12.0+ ARM64

ont_pyguppy_client_lib-7.2.15-cp311-cp311-macosx_10_15_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.11 macOS 10.15+ x86-64

ont_pyguppy_client_lib-7.2.15-cp310-cp310-win_amd64.whl (1.6 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

ont_pyguppy_client_lib-7.2.15-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

ont_pyguppy_client_lib-7.2.15-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

ont_pyguppy_client_lib-7.2.15-cp310-cp310-macosx_12_0_arm64.whl (3.8 MB view hashes)

Uploaded CPython 3.10 macOS 12.0+ ARM64

ont_pyguppy_client_lib-7.2.15-cp310-cp310-macosx_10_15_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.10 macOS 10.15+ x86-64

ont_pyguppy_client_lib-7.2.15-cp39-cp39-win_amd64.whl (1.6 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

ont_pyguppy_client_lib-7.2.15-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

ont_pyguppy_client_lib-7.2.15-cp39-cp39-macosx_12_0_arm64.whl (3.8 MB view hashes)

Uploaded CPython 3.9 macOS 12.0+ ARM64

ont_pyguppy_client_lib-7.2.15-cp39-cp39-macosx_10_15_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

ont_pyguppy_client_lib-7.2.15-cp38-cp38-win_amd64.whl (1.6 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

ont_pyguppy_client_lib-7.2.15-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

ont_pyguppy_client_lib-7.2.15-cp38-cp38-macosx_10_15_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.8 macOS 10.15+ x86-64

ont_pyguppy_client_lib-7.2.15-cp37-cp37m-win_amd64.whl (1.6 MB view hashes)

Uploaded CPython 3.7m Windows x86-64

ont_pyguppy_client_lib-7.2.15-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

ont_pyguppy_client_lib-7.2.15-cp37-cp37m-macosx_10_15_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.7m macOS 10.15+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page