PyDAIR

Python library for diversity analysis of immune repertoire.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v2 (GPLv2)
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 2.7
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

PyDAIR

PyDAIR is a Python package that aims to study immunoglobulin heavy (IGH) chain diversity based on repertoire-sequencing (Rep-Seq) data using high-throughput sequencing techonologies. PyDAIR identifies the germline variable (V), diversity (D), and joining (J) genes that used by each IGH sequence. BLAST is used for aligning sequences to a database of known germline VDJ genes to assign VDJ. PyDAIR supports all features as long as the two motifs that located at the end of V gene and the start of J gene are know. PyDAIR is available under the terms of the GNU license.

INSTALLTION

PyDAIR requires Python 2.7 together with NumPy, Pandas, matplotlib, and BioPython packages. Further, PyDAIR requires NCBI BLAST+ for aligning IGH sequence to germline databases. PyDAIR is avaliable on the PyPI repository, as well as can be installed like any other Python package using pip command.

pip install numpy --user
pip install pandas --user
pip install matplotlib --user
pip install biopython --user
pip install pydair --user

Installtion instructions for NCBI BLAST+ are available on NCBI website. User should follow the instruction to install NCBI BLAST+.

Usage

PyDAIR has two main commands that are pydair-parseseq and pydair-analysis.

Command	Function
pydair-parseseq	Identificat ion of V, D and J genes that used by each IGH sequence.
pydair-analysis	Aggregation of the frequencies of usage of V, D and J genes, as well as extraction of CDR-H3 sequences.

pydair-parseseq identifies V, D, and J genes from IGH each sequence by aligning IGH sequence to germline (V, D, and J) database using NCBI BLAST+. It requires IGH sequences, germline sequences, BLAST databases of germiline sequences, and BLAST parameters. The sequences should be given by FASTA format.

pydair-parseseq -q input_igh_sequences.fa \
                -v v.fa                   \
                -d d.fa                   \
                -j j.fa                   \
                --v-blastdb blastdb_v     \
                --d-blastdb blastdb_d     \
                --j-blastdb blastdb_j     \
                -o output1

PyDAIR generates several files to save the intermediate results, such as BLAST results, region that cannot be aligned to V and J genes. The final result is saved into output1.pydair file. If there several samples, pydair-parseseq should be run several times for each sample.

The statistical summaries are calculated by pydair-analysis command.

pydair-analysis -i output1.pydair output2.pydair output3.pydair  \
                -n Fugu1 Fugu2 Fugu3                             \
                -o stats_result                                  \
                --contain_ambiguous_D

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: GNU General Public License v2 (GPLv2)
Natural Language
- English
Operating System
- OS Independent
Programming Language
- Python :: 2.7
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.1.16

Jul 15, 2020

0.1.14

Jul 24, 2017

0.1.10

Dec 11, 2016

This version

0.1.2

Jun 1, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

PyDAIR-0.1.2-py2.py3-none-any.whl (37.2 kB view hashes)

Uploaded Jun 1, 2016 Python 2 Python 3

Hashes for PyDAIR-0.1.2-py2.py3-none-any.whl

Hashes for PyDAIR-0.1.2-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`fccd2cc9a92067fb150022c2293200a886147e4d85f8929c54c73871888b099a`
MD5	`7307b9451360cb98dc7c43c25a0b56d5`
BLAKE2b-256	`f13700f3bfe996056d9a484e7f0d4a25bd09e02fdb44845a83b94efe8dfb8dc9`