Skip to main content

trim adapters from high-throughput sequencing reads

Project description

.. image:: https://travis-ci.org/jdidion/atropos.svg?branch=master
:target: https://travis-ci.org/jdidion/atropos

.. image:: https://img.shields.io/pypi/v/atropos.svg?branch=master
:target: https://pypi.python.org/pypi/atropos

=======
Atropos
=======

Atropos is a tool for specific, sensitive, and speedy trimming of NGS reads. It is
a fork of the venerable `Cutadapt <https://github.com/marcelm/cutadapt>` read trimmer
(`DOI:10.14806/ej.17.1.200 <http://dx.doi.org/10.14806/ej.17.1.200>`) with the primary
improvements being:

1. Multi-threading support, including an extremely fast "parallel write" mode.
2. Implementation of a new insert alignment-based trimming algorith for paired-end reads that is substantially more sensitive and specific than the original Cutadapt adapter alignment-based algorithm.
3. Options for trimming specific types of data (miRNA, bisulfite-seq).
4. A new command ('detect') that will detect adapter sequences and other potential contaminants (this is experimental).
5. The ability (currently limited) to merge overlapping reads.
6. The ability to write the summary report and log messages to separate files.
7. The ability to write interleaved FASTQ output.
8. A progress bar, and other minor usability enhancements.

Dependencies
------------

* Python 3.3+ (python 2.x is NOT supported)
* Cython 0.24+ (`pip install Cython`)
* progressbar or tqdm (optional, if you want progressbar support)
* khmer 2.0+ (`pip install khmer`) (optional, if you want adapter detection support)

Installation
------------

``pip install atropos``

Usage
-----

Atropos is fully backward-compatible with cutadapt. If you currently use cutadapt,
you can simply install Atropos and then substitute the executable name in your command line,
with one key difference: you need to use options to specify input file names. For example:

.. code-block:: bash

atropos -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTTA -o trimmed.fq.gz -se reads.fq.gz

To take advantage of multi-threading, set the `--threads` option:

.. code-block:: bash

atropos --threads 8 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACGAGTTA -o trimmed.fq.gz -se reads.fq.gz

To take advantage of the new aligner (if you have paired-end reads with 3' adatpers), set the `--aligner` option to 'insert':

.. code-block:: bash
atropos --aligner insert -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGCCGTCTTCTGCTTG \
-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o trimmed.1.fq.gz -p trimmed.2.fq.gz \
-pe1 reads.1.fq.gz -pe2 reads.2.fq.gz

See the `Documentation <https://atropos.readthedocs.org/>` for more complete usage information.

Links
-----

* `Documentation <https://atropos.readthedocs.org/>`
* `Source code <https://github.com/jdidion/atropos/>`
* `Report an issue <https://github.com/jdidion/atropos/issues>`

Citations
---------

The citation for the original Cutadapt paper is:

Marcel Martin. "Cutadapt removes adapter sequences from high-throughput sequencing reads." EMBnet.Journal, 17(1):10-12, May 2011. http://dx.doi.org/10.14806/ej.17.1.200

A manuscript for Atropos is currently in preparation. For now, you can cite it as:

John P Didion. "Atropos." 2016. https://github.com/jdidion/atropos

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atropos-1.0.15.tar.gz (300.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page