Skip to main content

vireoSNP - donor deconvolution for multiplexed scRNA-seq data

Project description

PyPI Docs Build Status

vireo: donor deconvolution for pooled single-cell data

Vireo: Variational Inference for Reconstructing Ensemble Origin by expressed SNPs in multiplexed scRNA-seq data.

The name vireo follows the theme from cardelino (for clone deconvolution), while the Python package name is vireoSNP to aviod name confilict on PyPI.

Installation

Vireo is available through PyPI. To install, type the following command line, and add -U for upgrading:

pip install vireoSNP

Alternatively, you can download or clone this repository and type python setup.py install to install. In either case, add --user if you don’t have the permission as a root or for your Python environment.

For more instructions, see the installation manual.

Quick Usage

The following two subsections are quick usage guide. For more details, see the full manual or type vireo -h for all arguments. We also provide a demo.sh for running the test data sets in this repo.

Genotyping for each cell (pre-step)

There might be some bioinformatics efforts in this step, however, a few existing software can provide a solution. There are often two steps for this:

  1. identify candidate SNPs: known common SNPs / freebayes / cellSNP

  2. genotype candidate SNPs in each cell: cellSNP / vartrix / bcftools mpileup

See more introduction in the genotyping section.

Demultiplexing from allelic expression

The vireoSNP python package offers a set of utilities functions and an executable command line vireo for donor deconvolution in any of these four situations:

Mode 1: without any genotype:

vireo -c $CELL_DATA -N $n_donor -o $OUT_DIR

Mode 2: with genotype for all samples (specify tag -t: GT, GP, or PL)

vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR

Mode 3: with genotype for part of the samples (N is different from the sample number in $DONOR_GT_FILE)

vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR -N $n_donor

Mode 4: with genotype but not confident

vireo -c $CELL_DATA -d $DONOR_GT_FILE -o $OUT_DIR --forceLearnGT

In modes 3 and 4, the algorithm will run mode 1 first to estimate the genotypes of N donors and match them to the given donor genotypes (even partial). For the matched samples and SNPs, the input genotypes will replace the estiamted values as a prior in the second run.

Note, the cell data ($CELL_DATA) via -c can be any of the following two formats:

  • standard VCF file (compressed or uncompressed) with variants by cells

  • a cellSNP output folder containing VCF for variants info and sparse matrices AD and DP

Reference

Yuanhua Huang, Davis J. McCarthy, and Oliver Stegle. Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference. bioRxiv (2019): 598748.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vireoSNP-0.1.3.tar.gz (15.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page