starseqr

RNA-Fusion Calling with STAR

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

|Travis| |Pypi| |Conda| |Last|

=========
STAR-SEQR
=========
RNA Fusion Detection and Quantification using STAR.

Post-alignment run times are typically <20 minutes using 4 threads. Development is still ongoing and several features are currently in the works. DNA breakpoint detection is still experimental.

Installation
------------

This package is tested under Linux using Python 2.7, 3.4, 3.5, and 3.6.

You can install from Pypi. Please use a recent version of pip and cython:
::

pip install -U pip
pip install -U cython
pip install starseqr

Or build directly from Github by cloning the project, cd into the directory and run:
::

python setup.py install

Or from Docker:
::

docker pull eagenomics/starseqr

Or from Bioconda:
::

conda install -c bioconda starseqr

**Additional Requirements**
- biobambam2(https://github.com/gt1/biobambam2) or conda install -c bioconda biobambam
- STAR(https://github.com/alexdobin/STAR). Must use >2.5.3a. conda install -c bioconda star
- Velvet(https://github.com/dzerbino/velvet) or conda install -c bioconda velvet
- samtools(https://github.com/samtools/samtools) or conda install -c bioconda samtools
- Salmon(https://combine-lab.github.io/salmon/) or conda install -c bioconda salmon
- UCSC utils(http://hgdownload.soe.ucsc.edu/admin/exe/) or conda install -c bioconda ucsc-gtftogenepred
- gffread(http://ccb.jhu.edu/software/stringtie/dl/gffread-0.9.8c.tar.gz) or conda install -c bioconda gffread

Build a STAR Index
------------------

First make sure the dependencies are installed and generate a STAR index for your reference.

**RNA Index**
::

STAR --runMode genomeGenerate --genomeFastaFiles hg19.fa --genomeDir STAR_SEQR_hg19gencodeV24lift37_S1_RNA --sjdbGTFfile gencodeV24lift37.gtf --runThreadN 18 --sjdbOverhang 150 --genomeSAsparseD 1

Run STAR-SEQR
--------------

STAR-SEQR can perform alignment or utilize existing outputs from STAR. Note- STAR-SEQR alignment parameters have been tuned for fusion calling.

**Python on OS**
::

starseqr.py -1 RNA_1.fastq.gz -2 RNA_2.fastq.gz -m 1 -p RNA_test -t 12 -i path/STAR_INDEX -g gencode.gtf -r hg19.fa -vv

**CWL**

Note that `--name_prefix` must be a string basename in this case.
::
cwltool ~/path/STAR-SEQR/devtools/cwl/starseqr_v0.6.6.cwl --fq1 /path/UHRR_1_2_5m_L4_1.clipped.fastq.gz --fq2 /path/UHRR_1_2_5m_L4_2.clipped.fastq.gz --star_index_dir /path/gencodev25lift37/STAR_INDEX --name_prefix test_cwl --transcript_gtf /path/gencodev25/gencode.v25lift37.annotation.gtf --genome_fasta /path/gencodev25/GRCh37.primary_assembly.genome.fa --mode 1 --worker_threads 8

**DOCKER**

Note that `-p` must be a fully qualified path in this case.
::
docker run -it -v /mounts:/mounts eagenomics/starseqr:0.6.5 starseqr.py -1 /mounts/path/UHRR_1_2_5m_L4_1.clipped.fastq.gz -2 /mounts/path/UHRR_1_2_5m_L4_2.clipped.fastq.gz -p /mounts/path/test_docker -i /mounts/path/gencodev25lift37/STAR_INDEX -g /mounts/path/gencodev25/gencode.v25lift37.annotation.gtf -r /mounts/path/gencodev25/GRCh37.primary_assembly.genome.fa -m 1 -vv

Outputs
-------
A BEDPE file is produced and is compatible with SMC-RNA Dream Challenge.

Breakpoints.txt and Candidates.txt have the following columns:

+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| **Values** | **Description** |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NAME | Gene Symbols for left and right fusion partners |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NREAD_SPANS | The number of paired reads that are discordant spanning and suppor the fusion |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NREAD_JXNLEFT | The number of paired reads that are anchored on the left side of the gene fusion |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NREAD_JXNRIGHT | The number of paired reads that are anchored on the right side of the gene fusion |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| FUSION_CLASS | Classification of fusion based on chromosomal location, distance and strand. [GENE_INTERNAL, TRANSLOCATION, READ_THROUGH, INTERCHROM_INVERTED, INTERCHROM_INTERSTRAND] |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SPLICE_TYPE | Classification of the fusion breakpoint. If on the exon boundary is CANONICAL, else NON-CANONICAL |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| BRKPT_LEFT | The 0-based genomic position of the fusion breakpoint for the left gene partner |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| BRKPT_RIGHT | The 0-based genomic position of the fusion breakpoint for the right gene partner |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LEFT_SYMBOL | The left gene symbol |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| RIGHT_SYMBOL | The right gene symbol |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ANNOT_FORMAT | The description of keys that are used in the ANNOT column. Similar to VCF FORMAT notation. |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LEFT_ANNOT | The values described in the ANNOT_FORMAT column for the left gene breakpoint |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| RIGHT_ANNOT | The values described in the ANNOT_FORMAT column for the right gene breakpoint |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| DISTANCE | The genomic distance between breakpoints. Empty if a translocation. |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ASSEMBLED_CONTIGS | The velvet assembly of the supporting chimeric reads |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ASSEMBLY_CROSS_JXN | A boolean value indicating if the assembly crosses the putative breakpoint |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PRIMERS | Primers left, right designed against the highest expressing predicted fusion transcript |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | Internal notation of STAR-SEQR breakpoints. |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SPAN_CROSSHOM_SCORE | Homology score with range of [0-1] to indicate the probability of spanning chimeric reads mapping to both gene partners |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JXN_CROSSHOM_SCORE | Homology score with range of [0-1] to indicate the probability of junction chimeric reads mapping to both gene partners |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| OVERHANG_DIVERSITY | The number of unique fragments that fall from left anchored split-reads onto the right gene and vice-versa. |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MINFRAG20 | The number of overhang fragments that have at least 20 bases |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MINFRAG35 | The number of overhang fragments that have at least 35 bases |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TPM_FUSION | Expression of the most abundant fusion transcript expressed in transcripts per million |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TPM_LEFT | Expression of the most abundant left transcript expressed in transcripts per million |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| TPM_RIGHT | Expression of the most abundant right transcript expressed in transcripts per million |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| MAX_TRX_FUSION | Highest expressing fusion transcript. Expression corresponds to TPM_FUSION |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| DISPOSITION | Values to indicate PASS or other specific reasons for failure |
+---------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Feedback
--------

Yes! Please give us your feedback, raise issues, and let us know how the tool is working for you. Pull requests are welcome.

Contributions
-------------

This project builds of the groundwork of other public contributions. Namely:

- https://github.com/pysam-developers/pysam
- https://github.com/vishnubob/ssw
- https://github.com/libnano/primer3-py

.. |Travis| image:: https://travis-ci.org/ExpressionAnalysis/STAR-SEQR.svg?branch=master
:target: https://travis-ci.org/ExpressionAnalysis/STAR-SEQR

.. |Pypi| image:: https://badge.fury.io/py/starseqr.svg
:target: https://badge.fury.io/py/starseqr

.. |Conda| image:: https://anaconda.org/bioconda/starseqr/badges/installer/conda.svg
:target: https://bioconda.github.io/recipes/starseqr/README.html

.. |Last| image:: https://img.shields.io/github/last-commit/google/skia.svg
:target: https://github.com/ExpressionAnalysis/STAR-SEQR

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.6.7

Apr 6, 2018

0.6.6

Jan 4, 2018

0.6.5

Dec 21, 2017

0.6.4

Dec 7, 2017

0.6.3

Sep 25, 2017

0.6.2

Sep 22, 2017

0.6.1

Aug 16, 2017

0.6.0

May 12, 2017

0.5.1

Apr 21, 2017

0.5.0

Mar 8, 2017

0.4.1.dev10 pre-release

Feb 25, 2017

0.4.1.dev9 pre-release

Feb 25, 2017

0.4.1.dev2 pre-release

Feb 24, 2017

0.4.1.dev1 pre-release

Feb 24, 2017

0.4.0.dev0 pre-release

Feb 24, 2017

0.3.1

Jan 23, 2017

0.3.0.4

Jan 23, 2017

0.3.0

Jan 20, 2017

0.2.4

Jan 17, 2017

0.2.3

Jan 14, 2017

0.2.1

Jan 13, 2017

0.2.0

Jan 12, 2017

0.1.1

Jan 12, 2017

0.1.1.dev0 pre-release

Jan 6, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

starseqr-0.6.7.tar.gz (7.9 MB view hashes)

Uploaded Apr 6, 2018 Source

Hashes for starseqr-0.6.7.tar.gz

Hashes for starseqr-0.6.7.tar.gz
Algorithm	Hash digest
SHA256	`8ce741c48ed2bb33abd909d81140c52d3e93698cf37264bed3e90d5effa29c93`
MD5	`b415e84c3027f2d829db09eabdc6bf27`
BLAKE2b-256	`e04509dd694ab22dc5803d14599d520f0a966ce6406f4c5e4aa0ff4201d4d549`