how-are-we-stranded-here

Python package for testing strandedness of RNA-Seq fastq files

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

https://img.shields.io/pypi/v/how_are_we_stranded_here.svg

Python package for testing strandedness of RNA-Seq fastq files

Ever get RNA-Seq data where the library prep or strandedness has been omitted in the methods?

This should save some headaches later in your pipeline and analysis when you realise you’ve used the wrong strandedness setting (RF/fr-firststrand, FR/fr-secondstrand, unstranded)

Requirements

how_are_we_stranded_here requires the following packages be installed:

kallisto == 0.44.x

python >= 3.6.0

RSeQC

It also requires a transcriptome annotation (.fasta file - e.g. ensembl’s .cdna.fasta, or a prebuilt kallisto index), and a corresponding gtf.

Sometimes pseudoalignments will not work with newer versions of kallisto. If this is an issue, we suggest downgrading to 0.44.0.

Installation

pip install how_are_we_stranded_here

Usage

For basic usage, run check_strandedness with a gtf transcript annotation, transcripts fasta file and fastq read files from one sample.

check_strandedness --gtf Yeast.gtf --transcripts Yeast_cdna.fasta --reads_1 Sample_A_1.fq.gz --reads_2 Sample_A_2.fq.gz

Output

check_strandedness will print to console the results of infer_experiment.py (http://rseqc.sourceforge.net/#infer-experiment-py), along with an interpretation.

checking strandedness
Reading reference gene model stranded_test_WT_yeast_rep1_1_val_1_1/Saccharomyces_cerevisiae.R64-1-1.98.bed ... Done
Loading SAM/BAM file ...  Total 20000 usable reads were sampled
This is PairEnd Data
Fraction of reads failed to determine: 0.0595
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0073 (0.8% of explainable reads)
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9332 (99.2% of explainable reads)
Over 90% of reads explained by "1+-,1-+,2++,2--"
Data is likely RF/fr-firststrand

Any intermediate files are written to a folder in your current working directory derived from the name of the reads_1 file.

How it Works

check_strandedness.py runs a series of commands to check which direction reads align once mapped in transcripts.

It first creates a kallisto index (or uses a pre-made index) of your organisms transcriptome.

It then maps a small subset of reads (default 200000) to the transcriptome, and uses kallisto’s –genomebam argument to project pseudoalignments to genome sorted BAM file.

It finally runs RSeQC’s infer_experiment.py to check which direction reads from the first and second pairs are aligned in relation to the transcript strand, and provides output with the likely strandedness of your data.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.1

Mar 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

how_are_we_stranded_here-1.0.1.tar.gz (32.2 kB view hashes)

Uploaded Mar 9, 2021 Source

Built Distribution

how_are_we_stranded_here-1.0.1-py3-none-any.whl (11.3 kB view hashes)

Uploaded Mar 9, 2021 Python 3

Hashes for how_are_we_stranded_here-1.0.1.tar.gz

Hashes for how_are_we_stranded_here-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5db2fde61409e1c37ef65b9065c3935c5a462130c939e64c810dc022f47f559a`
MD5	`a3456c87409e9d1b388df02919353873`
BLAKE2b-256	`7b69779749cdcc8f059b6f578849a0f4f13c362c236f880feab57d6930638a0e`

Hashes for how_are_we_stranded_here-1.0.1-py3-none-any.whl

Hashes for how_are_we_stranded_here-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e09d80f3c849f5a93ac8f516beec376373f2e9135d7fbe146ef95bca777c221`
MD5	`bdeac091f9eeadf1fa455a6359425407`
BLAKE2b-256	`12b5483a02769e127eba72873cf537e5673841c93a98ac75f4ca38f843353c03`