Skip to main content

A metagenomics pipeline to estimate relative cell periods.

Project description

Menace
======

This bundle of software is a basic implementation of the algorithm for
extracting Peak-to-Trough Ratios from Metagenomic data, as first
described in `(Korem et. al, Science,
2015) <http://science.sciencemag.org/content/349/6252/1101>`__.

Installation:
-------------

Pip
~~~

Make sure that "pip" is the PyPi command of your *python2* installation,
then:

.. code:: bash

pip install menace

Git
^^^

.. code:: bash

git clone git@github.com:zertan/Menace.git
cd Menace
python setup.py install

This should install the below *python* dependencies. The other
dependencies have to be installed manually (if you have questions about
this I suggest you consult your cluster IT help desk).

The software has been tested on the "hebbe" cluster at
`C3SE <c3se.chalmers.se>`__ which uses the "slurm" system for resource
management (thus slurm is the only queueing system currently supported).

Dependencies:
~~~~~~~~~~~~~

::

Python2:
numpy
scipy
pandas
biopython
matplotlib
xmltodict
configparser
lmfit
newick
Jinja2
doric
-e git+https://github.com/PathoScope/PathoScope.git#egg=pathoscope

`samtools <http://www.htslib.org/download/>`__

`bamtools <https://github.com/pezmaster31/bamtools/wiki/Building-and-installing>`__

`bowtie2 <https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.2.9/>`__

`Pathoscope
2.0 <https://sourceforge.net/projects/pathoscope/files/?source=navbar>`__
(should be installed by the above pip command but make sure 'pathoscope
ID' is accessible in the shell, ie. is on the system path)

`parallel <http://www.gnu.org/software/parallel/>`__

`DoriC <http://tubic.tju.edu.cn/doric/download.php>`__ is a databse of
chromosome origin locations (OriCs) which is a (recommended) optional
dependency for the pipeline. Please visit the link and enter your e-mail
to download.

Usage
-----

You can get an overview of the menace functionality by running
``menace -h``.

1. Initialize a project in current directory by running ``menace init``.
Identify a set of NCBI genome reference accession numbers and put
them in "./searchStrings" (or use the default one which includes a
*minimal* set of references to bacteria common in the human gut).

2. Identify a metagenomic cohort of interest (download manually or add
URLs as described below) and add to the Data folder. Supported input:
raw/gzipped/bzipped ".fastq" files.

3. Add information to the ``project.conf`` file.

4. Edit ``loadmodules.sh`` to include the **python2** module of the
cluster (or comment out the lines if python2 is accessible by
default).

5. Run ``menace full`` (use "nohup {cmd} &" to keep alive after logout
if on a cluster login node).

6. Wait for job to complete. Run ``menace collect`` in project
directory.

Notes
^^^^^

The menace script is a common utility for all parts of the pipeline
including downloading of references and metagenomic data, bulding a
reference index, setting up the necessary file structure and submitting
to slurm. Hence, all configuration is intended to be set up in
project.conf (please see ``bin/project.conf.example`` for an example).

The default 'searchStrings' will most probably not fit your purposes but
is only an example. A more comprehensive Reference library will yield
higher coverage and more accurate values. A more comprehensive list of
human gut bacteria is available at 'extra/referenceACClong.txt'.

Directory structure (*example*)
-------------------------------

With the above usage example the path structure(s) will look something
like below.

::

$DATA_PATH
├ "Sample01" (eg. ERR525688)
. ├ {sample01_1.fastq.gz}
. └ {sample01_2.fastq.gz} paired metagenomic reads
.

$REF_PATH
├ Index
| └ {REF_NAME.*.bt2l} bowtie2 index files
├ Fasta
| └ {accession.fasta}
├ Headers
| └ {accession.xml} xml files containing extra genome references info
└ taxIDs.txt

$DORIC_PATH
├ bacteria_record.dat
└ bacteria_seq.fas

$OUTPUT_PATH
├ "Sample01"
. ├ depth
. | └ {accession.depth} coverage files for each reference
. ├ log
| └ {accession.log} output logs from piecewiseFit
├ npy
| └ {accession_OriC_TerC.npy} numpy files with origin/terminus locations and relative C periods
├ png
| └ {accession_fit.png} images of piecewise fit of the smoothed coverage
└ accession-sam-report.tsv Pathoscope2 reassignment report

Contents
--------

Below follows a description of the main scripts in the package.

jobscript
^^^^^^^^^

A submit script for sending a batch job to slurm for parallel processing
on a computing cluster.

**input:** none

**output:** directory structure as specified in "project.conf"

mainBuild.sh
^^^^^^^^^^^^

The main build script with commands intended to be executed on the
cluster.

**input:** none

**output:** temporary paths and files on compute nodes

PTRMatrix.py
^^^^^^^^^^^^

Traverses the specified directory generated by mainBuild.sh and
assembles information from each sample into tabular form (eg. averages
origin locations from many samples for a better estimate).

**input:** $OUTPUT\_PATH, $DORIC\_PATH, $REF\_PATH, bin/accLoc.csv

**output:** Abundance.csv, PTR.csv, DoublingTime.csv, Header.csv

piecewiseFit.py
^^^^^^^^^^^^^^^

Implements the piecewise linear fit and prior checks on the generated
depth files to filter out those instances in which enough data was
generated to produce a reliable coverage signal for estimating
replication origins. This data can be used further on, once those has
been estimated using the full cohort, to produce PTR-vaules for each
sample.

**input:** {reference.depth}

**output:** {reference\_OriC.npy}, {reference\_TerC.npy},
{reference\_coverage.png}, {reference\_fit.log}

fetchSeq.py
^^^^^^^^^^^

This utility can be used to download '.fasta' reference files from the
NCBI servers.

**input:** searchStrings.txt,

**output:** {reference.fasta}, {reference.xml}, taxIDs.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

menace-0.1.3.tar.gz (3.6 MB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page