peanut

A massively parallel GPU read mapper using OpenCL.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

PEANUT is a read mapper for DNA or RNA sequence reads. Read mapping is the process of aligning biological DNA or RNA sequencing reads to a known reference genome.

By exploiting the massive parallelism of modern graphics processors and a novel index datastructure, PEANUT achieves supreme sensitivity and speed compared to current state of the art read mappers like BWA MEM, Bowtie2 and RazerS3. PEANUT thereby allows to report both only the best hits or all hits.

For details about PEANUT, please refer to our scientific paper “Massively Parallel Read Mapping on GPUs with PEANUT” (under review). A preprint is available at http://arxiv.org/abs/1403.1706.

Requirements

A POSIX compatible OS (e.g. Linux, MacOS X, FreeBSD)
an NVIDIA GPU with up-to-date proprietary drivers (this may change in the future as the AMD drivers become more mature)
Python >= 3.2
Numpy >= 1.7
Cython >= 0.19
Pyopencl >= 2013.1

Installation

Using easy_install, all dependencies will be installed automatically if missing:

$ easy_install3 peanut

When installing manually from setup.py, just execute:

$ python3 setup.py install

Usage

PEANUT will be available as a command line tool. To index a reference genome genome.fasta, issue the following:

$ peanut index genome.fasta genome.index.hdf5

To map paired end reads reads.1.fastq and reads.2.fastq onto the indexed reference, use the following invocation:

$ peanut map --threads 8 genome.index.hdf5 reads.1.fastq reads.2.fastq | samtools view -Sb - > mapped.bam

As can be seen, PEANUT outputs hits in the SAM format. Hence, output has to be piped into samtools to obtain a BAM file. PEANUT buffers reads and hits in GPU and CPU memory. The default buffer settings of PEANUT are optimized for a GPU with at least 2.5GB memory and a CPU with 16GB memory (but 8GB should do, too). You can lower both buffer sizes to adapt for weaker systems, e.g.:

$ peanut map --read-buffer 100000 --hits-buffer 500000 ...

This however can reduce performance since the amount of possible parallelelism on the GPU is affected. For further help, invoke:

$ peanut --help

or visit http://peanut.readthedocs.org.

News

5 Sep 2014: Release 1.3.2 of PEANUT. Fixed a bug causing an invalid OpenCL work group size with uneven number of reads (special thanks to Sean Li for reporting this).
18 Aug 2014: Release 1.3 of PEANUT. Added rescue mode for paired-end sequencing.
7 Jul 2014: Release 1.2 of PEANUT. Improved mapping quality estimate that reflects the original posterior probability like defined in the MAQ paper of Heng Li.
26 May 2014: Release 1.1 of PEANUT. Reduced memory usage (at most 1/2 if you are lucky).
16 May 2014: Release 1.0.3 of PEANUT. Changed the argument order for the map subcommand to agree with other mappers.
7 May 2014: Release 1.0.2 of PEANUT. More fixes for alignment selection in case of paired end reads. Fixed missing import and –query-buffer not being considered.
30 Apr 2014: Release 1.0.1 of PEANUT. Improved flag usage in SAM output. Rescaled mapping quality in accordance with the paper. Fixed rare cases of where the wrong mate alignment was chosen as the best aligment.

Citation

If you use PEANUT in your publication, please cite the following preprint for now:

Köster, J., Rahmann, S., 2014. Massively parallel read mapping on GPUs with PEANUT. arXiv:1403.1706.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

1.3.7

Jul 16, 2015

1.3.6

Apr 25, 2015

1.3.5

Nov 9, 2014

1.3.4

Nov 9, 2014

1.3.3

Nov 9, 2014

This version

1.3.2

Sep 5, 2014

1.3.1

Aug 18, 2014

1.3

Aug 18, 2014

1.2

Jul 7, 2014

1.1

May 26, 2014

1.0.2

May 7, 2014

1.0.1

Apr 30, 2014

1.0

Mar 27, 2014

0.2

Mar 10, 2014

0.1

Mar 7, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peanut-1.3.2.tar.gz (697.4 kB view hashes)

Uploaded Sep 5, 2014 Source

Hashes for peanut-1.3.2.tar.gz

Hashes for peanut-1.3.2.tar.gz
Algorithm	Hash digest
SHA256	`ef2c2b8a2e895ba348afd5943ce737aa32a3fffcd3651e00782a929bffd96c3f`
MD5	`e4819f69ec01249eb9bef2e5c1825987`
BLAKE2b-256	`32a87cd20fa972451a02f2f1c0117201178f99529214539fe91a5bf5456f170e`