Skip to main content

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes.

Project description

🔥 Pyrodigal Stars

Cython bindings and Python interface to Prodigal, an ORF finder for genomes and metagenomes.

Actions Coverage License PyPI Bioconda Wheel Python Versions Python Implementations Source GitHub issues Changelog Downloads DOI

🗺️ Overview

Pyrodigal is a Python module that provides bindings to Prodigal using Cython. It directly interacts with the Prodigal internals, which has the following advantages:

  • single dependency: Pyrodigal is distributed as a Python package, so you can add it as a dependency to your project, and stop worrying about the Prodigal binary being present on the end-user machine.
  • no intermediate files: everything happens in memory, in a Python object you fully control, so you don't have to invoke the Prodigal CLI using a sub-process and temporary files.
  • no input formatting: sequences are manipulated directly as strings, which leverages the issue of formatting your input to FASTA for Prodigal.
  • lower memory usage: Pyrodigal is slightly more conservative when it comes to using memory, which can help process very large sequences. It also lets you save some more memory when running several meta-mode analyses

📋 Features

The library now features everything needed to run Prodigal in single or metagenomic mode. It is still missing some features of the CLI:

Roadmap:

  • ✔️ Metagenomic mode
  • ✔️ Single mode
  • ❌ External training file support (-t flag)
  • ❌ Region masking (-m flag)

🐏 Memory

Contrary to the Prodigal command line, Pyrodigal attempts to be more conservative about memory usage. This means that most of the allocations will be lazy, and that some functions will reallocate their results to exact-sized arrays when it's possible. This leads to Pyrodigal using about 30% less memory, but with a little bit more overhead to compute the size of buffers in advance.

🧶 Thread-safety

pyrodigal.Pyrodigal instances are thread-safe, and use an internal lock to prevent parallel calls to their methods from overwriting the internal buffers. However, a better solution to process sequences in parallel is to use a consumer/worker pattern, and have one Pyrodigal instance in each worker. Using a pool spawning Pyrodigal instances on the fly is also fine, but prevents recycling memory:

with multiprocessing.pool.ThreadPool() as pool:
    pool.map(lambda s: Pyrodigal(meta=True).find_genes(s), sequences)

🔧 Installing

Pyrodigal can be installed directly from PyPI, which hosts some pre-built CPython wheels for x86-64 Unix and Windows platforms, as well as the code required to compile from source with Cython:

$ pip install pyrodigal

Otherwise, Pyrodigal is also available as a Bioconda package:

$ conda install -c bioconda pyrodigal

💡 Example

Lets load a sequence from a GenBank file, use Pyrodigal to find all the genes it contains, and print the proteins in two-line FASTA format.

🔬 Biopython

To use Pyrodigal in single mode, you must explicitly call Pyrodigal.train with the sequence you want to use for training before trying to find genes, or you will get a RuntimeError:

p = pyrodigal.Pyrodigal()
p.train(bytes(record.seq))
genes = p.find_genes(bytes(record.seq))

However, in meta mode, you can find genes directly:

record = Bio.SeqIO.read("sequence.gbk", "genbank")
p = pyrodigal.Pyrodigal(meta=True)

for i, gene in enumerate(p.find_genes(bytes(record.seq))):
    print(f">{record.id}_{i+1}")
    print(record.translate())

On older versions of Biopython (before 1.79) you will need to use record.seq.encode() instead of bytes(record.seq).

🧪 Scikit-bio

seq = next(skbio.io.read("sequence.gbk", "genbank"))
p = pyrodigal.Pyrodigal(meta=True)

for i, gene in enumerate(p.find_genes(seq.values.view('B'))):
    print(f">{record.id}_{i+1}")
    print(record.translate())

We need to use the view method to get the sequence viewable by Cython as an array of unsigned char.

💭 Feedback

⚠️ Issue Tracker

Found a bug ? Have an enhancement request ? Head over to the GitHub issue tracker if you need to report or ask something. If you are filing in on a bug, please include as much information as you can about the issue, and try to recreate the same bug in a simple, easily reproducible situation.

🏗️ Contributing

Contributions are more than welcome! See CONTRIBUTING.md for more details.

⚖️ License

This library is provided under the GNU General Public License v3.0. The Prodigal code was written by Doug Hyatt and is distributed under the terms of the GPLv3 as well. See Prodigal/LICENSE for more information.

This project is in no way not affiliated, sponsored, or otherwise endorsed by the original Prodigal authors. It was developed by Martin Larralde during his PhD project at the European Molecular Biology Laboratory in the Zeller team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrodigal-0.5.4.tar.gz (1.5 MB view hashes)

Uploaded Source

Built Distributions

pyrodigal-0.5.4-pp37-pypy37_pp73-win_amd64.whl (1.6 MB view hashes)

Uploaded PyPy Windows x86-64

pyrodigal-0.5.4-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.7 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

pyrodigal-0.5.4-pp37-pypy37_pp73-macosx_10_7_x86_64.whl (1.6 MB view hashes)

Uploaded PyPy macOS 10.7+ x86-64

pyrodigal-0.5.4-pp36-pypy36_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.7 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

pyrodigal-0.5.4-pp36-pypy36_pp73-macosx_10_7_x86_64.whl (1.6 MB view hashes)

Uploaded PyPy macOS 10.7+ x86-64

pyrodigal-0.5.4-cp39-cp39-win_amd64.whl (1.7 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

pyrodigal-0.5.4-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.5 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

pyrodigal-0.5.4-cp39-cp39-macosx_10_14_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.9 macOS 10.14+ x86-64

pyrodigal-0.5.4-cp38-cp38-win_amd64.whl (1.7 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

pyrodigal-0.5.4-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.5 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

pyrodigal-0.5.4-cp38-cp38-macosx_10_14_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.8 macOS 10.14+ x86-64

pyrodigal-0.5.4-cp37-cp37m-win_amd64.whl (1.7 MB view hashes)

Uploaded CPython 3.7m Windows x86-64

pyrodigal-0.5.4-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

pyrodigal-0.5.4-cp37-cp37m-macosx_10_14_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.7m macOS 10.14+ x86-64

pyrodigal-0.5.4-cp36-cp36m-win_amd64.whl (1.7 MB view hashes)

Uploaded CPython 3.6m Windows x86-64

pyrodigal-0.5.4-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

pyrodigal-0.5.4-cp36-cp36m-macosx_10_14_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.6m macOS 10.14+ x86-64

pyrodigal-0.5.4-cp35-cp35m-win_amd64.whl (1.7 MB view hashes)

Uploaded CPython 3.5m Windows x86-64

pyrodigal-0.5.4-cp35-cp35m-manylinux2010_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.5m manylinux: glibc 2.12+ x86-64

pyrodigal-0.5.4-cp35-cp35m-macosx_10_14_x86_64.whl (1.7 MB view hashes)

Uploaded CPython 3.5m macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page