skip to navigation
skip to content

Not Logged In

tail-tools 0.23

Analyse PAT-Seq RNA expression data.

Latest Version: 0.26

==============
= Tail Tools =
==============

http://www.vicbioinformatics.com/software.tail-tools.shtml

This is a Python 2 based suite of tools for analysing SOLiD or Illumina
sequencing reads with poly(A) tails.

Use of PyPy is recommened for speed.


License:
========

This software is distributed under the terms of the GPL, version 2 or later,
excepting that:

- The third party javascript libraries included for convenience
  in directory tail_tools/web/third_party are covered by the terms of
  their respective licenses (also in that directory).

- The remaining files in the directory tail_tools/web are placed in the
  public domain.


Requirements:
=============

- "nesoni", available from http://vicbioinformatics.com/nesoni.shtml or using

    pip install nesoni

  You don't need to install all of nesoni's dependencies, just Python 2.7
  or later or PyPy.

- bowtie2
  (SHRiMP for legacy color-space data)

- The "convert" tool from ImageMagick.

- rsync (for downloads from UCSC browser)


Installation:
=============

Easy way:

  pip install tail-tools

From source:

  python setup.py install

For PyPy it seems to be currently easiest to set up in a virtualenv:

  virtualenv -p pypy myenv
  myenv/bin/pip install tail-tools


Usage:
======

This package contains a number of tools, which can be listed by typing:

  tail-tools


The package can be used directly from the source directory with:

  python -m tail_tools


These tools may also be used as part of a nesoni-style workflow python script.

Typical usage of the pipeline is described below.


Reference format:
=================

Before processing any reads, you need to create a "tail-tools reference directory".

References are most easily downloaed from the UCSC browser using:

   tail-tools make-ucsc-reference: \
       <output_dir> \
       <ucsc_reference_name>

If creating your own reference, it needs to consist of:

- sequences, eg in FASTA format
- annotations in GFF3 format

The reference directory is then created with the command:

  tail-tools make-tt-reference: \
      <output_dir> \
      <sequence_file> \
      <annotations_file>

Annotations shall include the following feature types and attributes:

gene
    required attributes:
    - ID      - unique identifier
    optional attributes:
    - Name    - nomenclature name
    - Product - short description

mRNA
    required attributes:
    - ID      - unique identifier
    - Parent  - gene ID

CDS
    required attributes:
    - Parent  - mRNA ID

exon
    required attributes:
    - Parent  - mRNA ID



Pipeline:
=========

Having created a reference directory, the next step is to run the pipeline,
"analyse-polya-batch". This can be done from the command line, but is more
usefully done from a python script. We suggest adapting the following example
to your data:


import tail_tools, nesoni, glob

tags = [
    ('logRep1',         ['BY',   'rep1']),
    ('logRep2',         ['BY',   'rep2']),
    ('deltaccr4logRep1',['ccr4', 'rep1']),
    ('deltaccr4logRep2',['ccr4', 'rep2']),
    ('deltaccr4logRep3',['ccr4', 'rep3']),
    ('YPEGRep1',        ['ypeg', 'rep1']),
    ('YPEGRep2',        ['ypeg', 'rep2']),
    ('GALRep1',         ['gal',  'rep1']),
    ('GALRep2',         ['gal',  'rep2']),
    ('GLU10Rep1',       ['glu10','rep1']),
    ('GLU10Rep2',       ['glu10','rep2']),
    ('GLU20Rep1',       ['glu20','rep1']),
    ('GLU20Rep2',       ['glu20','rep2']),
]

# For each sample we create a tail_tools.Analyse_polya instance
# Each sample is given a set of tags
samples = [ ]
for name, tags in tags:
    reads = sorted(glob.glob('mydata/Sample_scBY4741%s/*.fastq.gz' % name))
    samples.append(tail_tools.Analyse_polya(
        name,
        reads = reads,
        tags = tags,
        ))

action = tail_tools.Analyse_polya_batch(
        # Output directory
        'yeast-june-2013',

        # Title for report
        title = 'Yeast June 2013',

        # Files in report will have this prefix
        file_prefix = 'yeast-june-2013',

        # Reference directory you created earlier
        reference = 'sacCer3',

        # Allow reads/peaks this far downstrand of
        # the annotated transcript end point
        # For sparser genomes than yeast, perhaps use 1000
        extension = 200,

        # Whether to include .genome file for IGV in plots tarball
        # Not necessary if for model organisms where IGV
        # already provides the genome.
        include_genome = False,

        # List of instances of tail_tools.Analyse_polya
        samples = samples,

        # List of sample groups
        # A sample group is specified as <nesoni-selection-expression>=<name>
        # See nesoni help for description of selection expressions,
        # this uses the tags given to each sample to concisely
        # specify sets of samples.
        groups = [ 'BY=BY', 'ccr4=ccr4', 'ypeg=ypeg', 'gal=gal', 'glu10=glu10' ],
        )



# A little boilerplate so that
# - multiprocessing works
# - you can control making
#   (see nesoni help on --make-* flags)

def main():
    action.make()

if __name__ == '__main__':
    nesoni.run_script(main)

# If run again with adjusted parameters,
# only the parts that need to be run again will run.
#
# To force a complete re-run:
#     python myscript.py --make-do all
#
# To re-run everything but the alignment to reference
# (eg if there is a new version of tail-tools)
#     python myscript.py --make-do all --make-done analyse-polya
#
 
File Type Py Version Uploaded on Size
tail-tools-0.23.tar.gz (md5) Source 2014-02-04 119KB
  • Downloads (All Versions):
  • 55 downloads in the last day
  • 317 downloads in the last week
  • 2217 downloads in the last month