tail-tools

Analyse PAT-Seq RNA expression data.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

==============
= Tail Tools =
==============

http://www.vicbioinformatics.com/software.tail-tools.shtml

This is a Python 2 based suite of tools for analysing SOLiD or Illumina
sequencing reads with poly(A) tails.

Use of PyPy is recommened for speed.

License:
========

This software is distributed under the terms of the GPL, version 2 or later,
excepting that:

- The third party javascript libraries included for convenience
in directory tail_tools/web/third_party are covered by the terms of
their respective licenses (also in that directory).

- The remaining files in the directory tail_tools/web are placed in the
public domain.

Requirements:
=============

- "nesoni", available from http://vicbioinformatics.com/nesoni.shtml or using

pip install nesoni

You don't need to install all of nesoni's dependencies, just Python 2.7
or later or PyPy.

- bowtie2
(SHRiMP for legacy color-space data)

- The "convert" tool from ImageMagick.

- rsync (for downloads from UCSC browser)

Installation:
=============

Easy way:

pip install tail-tools

From source:

python setup.py install

For PyPy it seems to be currently easiest to set up in a virtualenv:

virtualenv -p pypy myenv
myenv/bin/pip install tail-tools

Usage:
======

This package contains a number of tools, which can be listed by typing:

tail-tools

The package can be used directly from the source directory with:

python -m tail_tools

These tools may also be used as part of a nesoni-style workflow python script.

Typical usage of the pipeline is described below.

Reference format:
=================

Before processing any reads, you need to create a "tail-tools reference directory".

References are most easily downloaed from the UCSC browser using:

tail-tools make-ucsc-reference: \
<output_dir> \
<ucsc_reference_name>

If creating your own reference, it needs to consist of:

- sequences, eg in FASTA format
- annotations in GFF3 format

The reference directory is then created with the command:

tail-tools make-tt-reference: \
<output_dir> \
<sequence_file> \
<annotations_file>

Annotations shall include the following feature types and attributes:

gene
required attributes:
- ID - unique identifier
optional attributes:
- Name - nomenclature name
- Product - short description

mRNA
required attributes:
- ID - unique identifier
- Parent - gene ID

CDS
required attributes:
- Parent - mRNA ID

exon
required attributes:
- Parent - mRNA ID

Pipeline:
=========

Having created a reference directory, the next step is to run the pipeline,
"analyse-polya-batch". This can be done from the command line, but is more
usefully done from a python script. We suggest adapting the following example
to your data:

import tail_tools, nesoni, glob

tags = [
('logRep1', ['BY', 'rep1']),
('logRep2', ['BY', 'rep2']),
('deltaccr4logRep1',['ccr4', 'rep1']),
('deltaccr4logRep2',['ccr4', 'rep2']),
('deltaccr4logRep3',['ccr4', 'rep3']),
('YPEGRep1', ['ypeg', 'rep1']),
('YPEGRep2', ['ypeg', 'rep2']),
('GALRep1', ['gal', 'rep1']),
('GALRep2', ['gal', 'rep2']),
('GLU10Rep1', ['glu10','rep1']),
('GLU10Rep2', ['glu10','rep2']),
('GLU20Rep1', ['glu20','rep1']),
('GLU20Rep2', ['glu20','rep2']),
]

filename_pattern = 'mydata/Sample_scBY4741%s/*.fastq.gz'

# For each sample we create a tail_tools.Analyse_polya instance
# Each sample is given a set of tags
samples = [ ]
for name, tags in tags:
reads = sorted(glob.glob(filename_pattern % name))
samples.append(tail_tools.Analyse_polya(
name,
reads = reads,
tags = tags,
))

action = tail_tools.Analyse_polya_batch(
# Output directory
'yeast-june-2013',

# Title for report
title = 'Yeast June 2013',

# Files in report will have this prefix
file_prefix = 'yeast-june-2013',

# Reference directory you created earlier
reference = 'sacCer3',

# Allow reads/peaks this far downstrand of
# the annotated transcript end point
# For sparser genomes than yeast, perhaps use 1000
extension = 200,

# Whether to include .genome file for IGV in plots tarball
# Not necessary if for model organisms where IGV
# already provides the genome.
include_genome = False,

# Whether to generate IGV plots.
# This is currently memory intensive!
include_plots = True,

# List of instances of tail_tools.Analyse_polya
samples = samples,

# List of sample groups
# A sample group is specified as <nesoni-selection-expression>=<name>
# See nesoni help for description of selection expressions,
# this uses the tags given to each sample to concisely
# specify sets of samples.
groups = [ 'BY=BY', 'ccr4=ccr4', 'ypeg=ypeg', 'gal=gal', 'glu10=glu10' ],

# (Advanced)
# Perform differential tests
tests = [
tail_tools.Test(
'BY-ccr4',
title='BY vs ccr4',
null=['BY/ccr4'],
alt=['ccr4'],
),
#etc
],
)

# A little boilerplate so that
# - multiprocessing works
# - you can control making
# (see nesoni help on --make-* flags)

def main():
action.make()

if __name__ == '__main__':
nesoni.run_script(main)

# If run again with adjusted parameters,
# only the parts that need to be run again will run.
#
# To force a complete re-run:
# python myscript.py --make-do all
#
# To re-run everything but the alignment to reference
# (eg if there is a new version of tail-tools)
# python myscript.py --make-do all --make-done analyse-polya
#

BAM file annotations
====================

AA:i:...
- Will be present if the read is considered poly(A) (has at lest four non-templated As)

AN:i:...
- Gives the observed non-templated poly(A) tail length

AD:i:...
- Gives the number of adaptor sequence bases observed after the end of
the poly(A) sequence. If the adaptor sequence is observed, then we have
sequenced the entirety of the poly(A) tail of this fragment.
If absent assume zero. Not supported for colorspace reads.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.42

Nov 16, 2015

0.41

Nov 11, 2015

0.41a0 pre-release

Nov 11, 2015

0.40

Jul 12, 2015

0.39

Jul 8, 2015

0.38

May 25, 2015

0.37

Mar 10, 2015

0.36

Feb 2, 2015

0.35

Oct 30, 2014

This version

0.34

Jul 31, 2014

0.33

Jun 19, 2014

0.32

Jun 17, 2014

0.31

May 21, 2014

0.30

May 19, 2014

0.29

May 14, 2014

0.28

May 12, 2014

0.27

Apr 26, 2014

0.26

Mar 26, 2014

0.25

Mar 26, 2014

0.24

Mar 25, 2014

0.23

Feb 4, 2014

0.22

Dec 18, 2013

0.21

Dec 9, 2013

0.20

Oct 17, 2013

0.19

Aug 19, 2013

0.18

Aug 19, 2013

0.17

Jun 24, 2013

0.16

May 25, 2013

0.15

May 14, 2013

0.14

May 14, 2013

0.13

Apr 17, 2013

0.12

Apr 10, 2013

0.11

Mar 20, 2013

0.10

Jan 3, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tail-tools-0.34.tar.gz (129.9 kB view hashes)

Uploaded Jul 31, 2014 Source

Hashes for tail-tools-0.34.tar.gz

Hashes for tail-tools-0.34.tar.gz
Algorithm	Hash digest
SHA256	`06fe83fb5be6f2a988b3f8595452f22fd74256ce5a27cd3c6347db688112275a`
MD5	`494d2ac73df57c2dc764e281e641bcc8`
BLAKE2b-256	`7b891cfd1de05257e862f1990776ae097d2adf7c729747b7207a48715dcd8a3f`