Skip to main content

Analysis of the .gaf files for futher debias analysis.

Project description

# GOFindBias: Analysis tool for finding bias in the GAF files.
GOFindBias is developed to provide the user with some insightful statistics about the [GAF](http://www.geneontology.org/page/go-annotation-file-formats) file to determine if the conclusions on the gene ontology studies can be biased because of abstract terms or the high throughput experiments([1](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003063))..


Statistics Provided by the tool are as follows:
1. Shannon's equitability.
2. Top 'n' PubMed and GO terms.
3. KS test to compare two different [GAF](http://www.geneontology.org/page/go-annotation-file-formats) files.
4. Mutual terms between two GAF files which are among 'n' most frequent terms and their respective frequencies.


### Prerequisites:
#### Required modules.

Modules are available in most GNU/Linux distributions, or from their respective websites.

* [Matplotib](https://matplotlib.org/)

* [Biopython](http://biopython.org/)

### Installation

Installing from source
```
git clone https://github.com/Pranavkhade/GOFindBias
cd GOFindBias
python setup.py install
```

Installing with pip
```
pip install GOFindBias
```
OR
```
pip install git+git://github.com/Pranavkhade/GOFindBias
```

### Files and instructions

1. Collect the GAF file you wish to analyse. For reference GAF files you can visit ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/
2. You can also use the GAF files obtained as an output from the [debias](https://github.com/Rinoahu/debias) program.
3. Instructions and help for the parameter is as follows:

```
usage: GOFindBias.py [-h]
(-i GAF_FILE [GAF_FILE ...] | -cmpr FILENAME FILENAME)
[-ls 1 OR O] [-ts TOP]
[-e EVIDENCE_CODE [EVIDENCE_CODE ...]]

GOFindBias is an analytical tool built to analyse the .gaf files. Please
visit: https://github.com/Pranavkhade/GOFindBias for more details.

optional arguments:
-h, --help show this help message and exit
-i GAF_FILE [GAF_FILE ...], --input GAF_FILE [GAF_FILE ...]
Names of the input GAF file(s).
-cmpr FILENAME FILENAME, --compare FILENAME FILENAME
Names of the two GAF files
-ls 1 OR O, --logscale 1 OR O
For graphs, 0: Counts in normal scale 1: Counts in log
scale [default=0]
-ts TOP, --topstat TOP
Top n statistics sorted from highest to lowest
[default=10]
-e EVIDENCE_CODE [EVIDENCE_CODE ...], --evidence EVIDENCE_CODE [EVIDENCE_CODE ...]
Accepts Standard Evidence Codes outlined in
(http://geneontology.org/page/guide-go-evidence-
codes). All 3 letter code for each standard evidence
is acceptable. In addition to that EXPEC is accepted
which will pull out all annotations which are made
experimentally. COMPEC will extract all annotations
which have been done computationally. Similarly,
AUTHEC and CUREC are also accepted.
```
### Examples

1. `GOFindBias -i test/2014.gaf -ls 1 -ts 10`
This command will parse 2014.gaf file for the analysis and all the GO Term counts will be represented on the Natural Log scale for better comparitive visualisation of the data. The last argument is the number of top 'n' entries with highest count in the GAF file. The output of graphs will be posted in the `/graph_output` folder with names corrosponding to GO/PubMed ID count and the ontology level (F/C/P). File named `Shannon's_Statistics.txt` will have the information about the diversity of a given .gaf file.

2. `GOFindBias -cmpr test/2014.gaf test/2015.gaf -ts 50`
This command will compare two files and will give KS(non-parametric) Test p-value. Along with it, it will create `COMPARE.txt` having common GO terms and PMID between top 50(n) most frequent terms from each GAF file.

3. `GOFindBias -i test/2014.gaf test/2015.gaf test/2016.gaf -e EXPEC`
This command will parse all the mentioned files and will fetch statistics for only [Experimental Evidence Codes](http://geneontology.org/page/experimental-evidence-codes).

### NOTE

1. If you are using Anaconda environment, make sure that the Python reads libraries from "~anaconda2/lib/python2.7/site-packages/lib". You can also simply copy the files in that location to an appropriate path where other python libraries are readable (importable) by Python.
2. You can find few test .gaf files in the /lib/test/ folders.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

GOFindBias-1.2.3b1-py2.py3-none-any.whl (9.8 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page