keggm

A tool for doing enrichment tests of functional groupings of genes across genomes and lineages.

Project description

Keggm - a small suite of tools I use to analyses microbial genomes

This is currently a very barebones package. Only two functions are fully operational.

enrichm looks for metabolic blocks which are enriched in your genomes compared to some background. This is good to try and get a quick idea on how your genomes are different to the background set in terms of certain metabolic functions. You can specify you own blocks and use custom protein names which make it quite extensible.

completm is a small tool to aid exploring what functions your genome can perform. It creates a “completeness” matrix which gives you an idea if your genome shows the potential to perform that metabolic block. It also create a matrix with the protein names which contributed to that completeness which can help you check if your metabolic block was complete due to proteins which are normally poorly annotated and, if there are complementary proteins, which ones were present. This will be expanded to provide a list of “complete” modules for each organism based on some user threshold.

In the works

plots aims to create some small visualisations to better parse the completeness results. It consists of heatmaps, to quickly scan across genomes, and will later included arrows diagrams of each metabolic block where each arrow represents a protein. The arrows will be coloured based on the organisms which had the relevant proteins.

overlap tries to identify is organisms have the potential to supplement each other. It does this by looking for metabolic blocks which are complete in one organisms but the rests of the metabolic block can be found in another organism.

TODO:

Make a better test suite
Make it usable on the command line (for convenience)
Implement auxilliary non-enrichment features into main software
1. Plots with options to compare completeness
2. Visualisation of overlap within module across multiple genomes similar to Symbiodinium+coral paper
3. Some kind of colourisation of KEGG pathways to give you a broad idea of what’s present within a pathway.
Eventually, if I can, make the network stuff robust enough for the potential automated discovery of novel metabolism
Add more customisability - Make it that any user can essentially create there own extra kegg data for use in this software
1. requires a few auxilliary tools to augment permanent databases
Unify Database scraping and production to be a single command

Other todo:

Make it multithreaded/multiprocessor at the comparison stage (current scale of comparisons poses no speed issue)

Implement in terms of producure consumer model of multithreading
Investigate optimised Booschloo’s test and if it has reasonable runtime (unlikely)
Implement a logging system for better debugging - will make my life easier.

Project details

Release history Release notifications | RSS feed

This version

0.0.1

Dec 21, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keggm-0.0.1.zip (35.2 kB view hashes)

Uploaded Dec 21, 2016 Source

Built Distribution

keggm-0.0.1-py2.py3-none-any.whl (44.0 kB view hashes)

Uploaded Dec 21, 2016 Python 2 Python 3

Hashes for keggm-0.0.1.zip

Hashes for keggm-0.0.1.zip
Algorithm	Hash digest
SHA256	`0994ba442b4b741f24faa3b1e7cf160f23a9c2f138de989d7ea926ffdcbec039`
MD5	`8f50124641b6e2584cd138170b8a587e`
BLAKE2b-256	`d9ba240ce889185407d6780e30560e902d67d594be1d831293a2facf221389f3`

Hashes for keggm-0.0.1-py2.py3-none-any.whl

Hashes for keggm-0.0.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`c0ab873424388839fb13732d970e307011be33b853ad59dbdf9d64fc73ee7395`
MD5	`b1588fe5c0b8cea8bf14ca01ae23e347`
BLAKE2b-256	`733f7a3c235b16c69e63a75382bca12ed5a7f96e11a5dc1d13ff6bf072c004af`