skip to navigation
skip to content

Not Logged In

tail-tools 0.29

Analyse PAT-Seq RNA expression data.

Latest Version: 0.40


==============
= Tail Tools =
==============

http://www.vicbioinformatics.com/software.tail-tools.shtml

This is a Python 2 based suite of tools for analysing SOLiD or Illumina
sequencing reads with poly(A) tails.

Use of PyPy is recommened for speed.


License:
========

This software is distributed under the terms of the GPL, version 2 or later,
excepting that:

- The third party javascript libraries included for convenience
in directory tail_tools/web/third_party are covered by the terms of
their respective licenses (also in that directory).

- The remaining files in the directory tail_tools/web are placed in the
public domain.


Requirements:
=============

- "nesoni", available from http://vicbioinformatics.com/nesoni.shtml or using

pip install nesoni

You don't need to install all of nesoni's dependencies, just Python 2.7
or later or PyPy.

- bowtie2
(SHRiMP for legacy color-space data)

- The "convert" tool from ImageMagick.

- rsync (for downloads from UCSC browser)


Installation:
=============

Easy way:

pip install tail-tools

From source:

python setup.py install

For PyPy it seems to be currently easiest to set up in a virtualenv:

virtualenv -p pypy myenv
myenv/bin/pip install tail-tools


Usage:
======

This package contains a number of tools, which can be listed by typing:

tail-tools


The package can be used directly from the source directory with:

python -m tail_tools


These tools may also be used as part of a nesoni-style workflow python script.

Typical usage of the pipeline is described below.


Reference format:
=================

Before processing any reads, you need to create a "tail-tools reference directory".

References are most easily downloaed from the UCSC browser using:

tail-tools make-ucsc-reference: \
<output_dir> \
<ucsc_reference_name>

If creating your own reference, it needs to consist of:

- sequences, eg in FASTA format
- annotations in GFF3 format

The reference directory is then created with the command:

tail-tools make-tt-reference: \
<output_dir> \
<sequence_file> \
<annotations_file>

Annotations shall include the following feature types and attributes:

gene
required attributes:
- ID - unique identifier
optional attributes:
- Name - nomenclature name
- Product - short description

mRNA
required attributes:
- ID - unique identifier
- Parent - gene ID

CDS
required attributes:
- Parent - mRNA ID

exon
required attributes:
- Parent - mRNA ID



Pipeline:
=========

Having created a reference directory, the next step is to run the pipeline,
"analyse-polya-batch". This can be done from the command line, but is more
usefully done from a python script. We suggest adapting the following example
to your data:


import tail_tools, nesoni, glob

tags = [
('logRep1', ['BY', 'rep1']),
('logRep2', ['BY', 'rep2']),
('deltaccr4logRep1',['ccr4', 'rep1']),
('deltaccr4logRep2',['ccr4', 'rep2']),
('deltaccr4logRep3',['ccr4', 'rep3']),
('YPEGRep1', ['ypeg', 'rep1']),
('YPEGRep2', ['ypeg', 'rep2']),
('GALRep1', ['gal', 'rep1']),
('GALRep2', ['gal', 'rep2']),
('GLU10Rep1', ['glu10','rep1']),
('GLU10Rep2', ['glu10','rep2']),
('GLU20Rep1', ['glu20','rep1']),
('GLU20Rep2', ['glu20','rep2']),
]

# For each sample we create a tail_tools.Analyse_polya instance
# Each sample is given a set of tags
samples = [ ]
for name, tags in tags:
reads = sorted(glob.glob('mydata/Sample_scBY4741%s/*.fastq.gz' % name))
samples.append(tail_tools.Analyse_polya(
name,
reads = reads,
tags = tags,
))

action = tail_tools.Analyse_polya_batch(
# Output directory
'yeast-june-2013',

# Title for report
title = 'Yeast June 2013',

# Files in report will have this prefix
file_prefix = 'yeast-june-2013',

# Reference directory you created earlier
reference = 'sacCer3',

# Allow reads/peaks this far downstrand of
# the annotated transcript end point
# For sparser genomes than yeast, perhaps use 1000
extension = 200,

# Whether to include .genome file for IGV in plots tarball
# Not necessary if for model organisms where IGV
# already provides the genome.
include_genome = False,

# List of instances of tail_tools.Analyse_polya
samples = samples,

# List of sample groups
# A sample group is specified as <nesoni-selection-expression>=<name>
# See nesoni help for description of selection expressions,
# this uses the tags given to each sample to concisely
# specify sets of samples.
groups = [ 'BY=BY', 'ccr4=ccr4', 'ypeg=ypeg', 'gal=gal', 'glu10=glu10' ],

# (Advanced)
# Perform differential tests
tests = [
tail_tools.Test(
'BY-ccr4',
title='BY vs ccr4',
null=['BY/ccr4'],
alt=['ccr4'],
),
#etc
],
)



# A little boilerplate so that
# - multiprocessing works
# - you can control making
# (see nesoni help on --make-* flags)

def main():
action.make()

if __name__ == '__main__':
nesoni.run_script(main)

# If run again with adjusted parameters,
# only the parts that need to be run again will run.
#
# To force a complete re-run:
# python myscript.py --make-do all
#
# To re-run everything but the alignment to reference
# (eg if there is a new version of tail-tools)
# python myscript.py --make-do all --make-done analyse-polya
#




 
File Type Py Version Uploaded on Size
tail-tools-0.29.tar.gz (md5) Source 2014-05-14 124KB
  • Downloads (All Versions):
  • 131 downloads in the last day
  • 925 downloads in the last week
  • 3746 downloads in the last month