Skip to main content

Annotate genetic inheritance models in variant files

Project description

GENMOD
======

Tool for annotating patterns of inheritance Variant Call Format (VCF)
files.

Each variant in the VCF-file will be annotated with which genetic models
that are followed in the family. The genetic models that are checked are
the following:

- Autsomal Recessive, denoted 'AR'
- Autsomal Recessive denovo, denoted 'AR\_dn'
- Autsomal Dominant, 'AD'
- Autsomal Dominant denovo, 'AD\_dn'
- Autosomal Compound Heterozygote, 'AR\_comp'
- X-linked dominant, 'XD'
- X-linked dominant de novo, 'XD\_dn'
- X-linked Recessive, 'XR'
- X-linked Recessive de novo, 'XR\_dn'

**GENMOD** will add entrys to the INFO column for the given VCF file.
The new entrys are:

- GM: A colon separated list with genetic models followed
- ANN: Colon separated list with features overlapped in the annotation
file
- Comp: Colon separated list with compound pairs(if any). These are
described like CHR\_POS\_REF\_ALT.
- MS: Model Score, a phred-score based on the genotype qualities to
describe the uncertainty of the genetic model.

Installation:
-------------

genmod works with Python 2.7 and Python 3.

::

pip install genmod

or

::

git clone git@github.com:moonso/genmod.git
cd genmod
python setup.py install

USAGE:
~~~~~~

::

genmod ped_file variant_file annotation_file

Conditions for Genetic Models
-----------------------------

Short explanation of genotype calls in VCF format:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Since we only look at humans, that are diploid, the genotypes represent
what we see on both alleles in a single position. 0 represents the
reference sequence, 1 is the first of the alternative alleles, 2 second
alternative and so on. If no phasing has been done the genotype is an
unordered pair on the form x/x, so 0/1 means that the individual is
heterozygote in this given position with the reference base on one of
the alleles and the first of the alternatives on the other. 2/2 means
that we see the second of the alternatives on both alleles. Some
chromosomes are only present in one copy in humans, here it is allowed
to only use a single digit to show the genotype. A 0 would mean
reference and 1 first of alternatives.

If phasing has been done the pairs are not unordered anymore and the
delimiter is then changed to '\|', so one can be heterozygote in two
ways; 0\|1 or 1\|0.

Autosomal Recessive
~~~~~~~~~~~~~~~~~~~

For this model individuals can be carriers so healthy individuals can be
heterozygous. Both alleles need to have the variant for an individual to
be sick so a healthy individual can not be homozygous alternative and a
sick individual *has* to be homozygous alternative.

- Affected individuals have to be homozygous alternative (hom. alt.)
- Healthy individuals cannot be hom. alt.
- Variant is considered *de novo* if both parents are genotyped and do
not carry the variant

Autosomal Dominant
~~~~~~~~~~~~~~~~~~

- Affected individuals have to be heterozygous (het.)
- Healthy individuals cannot have the alternative variant
- Variant is considered *de novo* if both parents are genotyped and do
not carry the variant

Autosomal Compound Heterozygote
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This model includes pairs of variants that are present within the same
feature.

1. Non-phased data:

- Affected individuals have to be het. for both variants
- Healthy individuals can be het. for one of the variants but cannot
have both variants
- Variant is considered *de novo* if only one or no variant is found
in the parents

2. Phased data:

- All affected individuals have to be het. for both variants and the
variants has to be on two different alleles
- Healthy individuals can be heterozygous for one or both of the
variants if they are on the allele
- If only one or no variant is found in parents it is considered *de
novo*

X-Linked Dominant
~~~~~~~~~~~~~~~~~

These traits are inherited on the x-chromosome, of which men have one
allele and women have two.

- Variant has to be on chromosome X
- Affected individuals have to be het. or hom. alt.
- Healthy individuals cannot carry the variant
- If sex is male the variant is considered *de novo* if mother is
genotyped and does not carry the variant
- If sex is female variant is considered *de novo* if none of the
parents carry the variant

X Linked Recessive
~~~~~~~~~~~~~~~~~~

- Variant has to be on chromosome X
- Affected individuals have to be het. or hom. alt.
- Healthy individuals cannot be hom. alt.
- If sex is male the variant is considered *de novo* if mother is
genotyped and does not carry the variant
- If sex is female variant is considered *de novo* if not both parents
carry the variant

.. raw:: html

<!-- ## Detailed Structure ##

Here all attributes and methods of the classes will be showed:

### Genotype ###

Store the genotype information of a variant that is specific for an individual

**Attributes:**

* genotype STRING
* allele_1 STRING
* allele_2 STRING
* nocall BOOL
* heterozygote BOOL
* homo_alt BOOL
* homo_ref BOOL
* has_variant BOOL
* filter STRING
* ref_depth INT
* alt_depth INT
* phred_likelihoods TUPLE with INT
* depth_of_coverage INT
* genotype_quality FLOAT


### Variant ###

Holds the info of a variant and it's specific behaviour in a family.

**Attributes**

* chr STRING Have to be string since X, Y
* start INT
* stop INT
* ref STRING Reference nucleotide(s)
* alt STRING Alternative sequence
* identity STRING dbSNP-id
* var_info DICT A dictionary with all the info from the variant file
* qual STRING A value for the score of the base call
* filter STRING The filter status
* genotypes LIST A list with the genotypes found for this variants
* gene STRING Semicolon separated string with ensemble gene names
* ad BOOL If following Autosomal Dominant pattern
* ad_dn BOOL If following Autosomal Dominant De novo pattern
* ar BOOL If following Autosomal Recessive pattern
* ar_dn BOOL If following Autosomal Recessive De nove pattern
* ar_comp BOOL If following Autosomal Recessive compound pattern
* ar_comp_dn BOOL If following Autosomal Recessive Compound De Novo pattern

**Methods**

* get_variant(self):
Returns a dictionary with basic info to stdout
* print_model_info(self):
Print for each variant which patterns of inheritance they follow.
* print_vcf_variant(self):
Print the variant in vcf-format to stdout
* print_original_version(self, header_columns):
Prints the variant in its original format.
* check_noncomplete_call(self):
Check if GATK have missed to report some info.
* get_genotype(self):
Returns the list with genotypes for this variant.


### Individual ###

Holds the information about an individual and the individual specific genotypes.

**Attributes**

* ind STRING Can be any id unique within the family
* family STRING Can be any unique id within the cohort
* mother STRING The ind_id of the mother or [0,-9] if info is missing
* father STRING ---------||------ father --------------||---------------
* sex INT 1=male 2=female 0=unknown
* phenotype INT 1=unaffected, 2=affected, missing = [0,-9]
* genotypes DICT Container with genotype information on the form {<variant_id>: <Genotype>}
* phasing BOOL If the genotype information includes phasing for this individual

### Family ###

**Attributes**

* individuals DICT dictionary with family members on the form {<ind_id>:<Individual>}
* variants DICT dictionary with all the variants that exists in the family on the form {<var_id>:<Variant>} -->

|Bitdeli Badge|

.. |Bitdeli Badge| image:: https://d2weczhvl823v0.cloudfront.net/moonso/genmod/trend.png
:target: https://bitdeli.com/free

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genmod-0.9.7.tar.gz (24.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page