Skip to main content

My python library of classes and functions that help me work

Project description

[![Build Status](https://travis-ci.org/childsish/lhc-python.svg?branch=master)](https://travis-ci.org/childsish/lhc-python)

lhc-python
==========

This is my personal library of python classes and functions, many of them have bioinformatics applications. The library changes constantly and at a whim. If you want to use it, approach with caution. Over time however, parts appear to be settling on a stable configuration.

binf
----
A collection of bioinformatics related modules

### binf.collection
Several collection classes that to handle biological data. They are designed to read from files that have strict standards and provide some access benefits eg. NetCDF and SQLite.

* **binf.collection.marker_set.** A class designed to hold marker data. Implemented as NetCDF4. Markers are considered a genomic position that varies across several genotypes, thus is implemented as a matrix of genotype x genomic position.

There are a couple of steps required to initialise this class. You need to provide a reference sequence and the positions of the reference sequence. The reference sequence is a n x m matrix where n is the position and m is the ploidy. The positions are provided as an ordered dictionary of chromosomes to a list of positions.
```python
from lhc.binf.collection.marker_set import Reference, MarkerSet
mrks = 'ACGATCAGGCT'
ref = Reference(ref=np.vstack([list(mrks), list(mrks)]).T,
poss=OrderedDict([
('Chr1', [5, 10, 15, 20, 25]),
('Chr2', [4, 6, 8, 10, 12, 14])
]))
mrk_set = MarkerSet(self.fname, ref)
```
For each genotype you call the registerGenotype function with the name of the genotype and the markers.
```python
mrks = 'ACGACTGGGCT'
mrk_set.registerGenotype('genotype_A',
np.vstack([list(mrks), list(mrks)]).T)
```
You can now use the functions getMarkerAtPosition and getMarkersInInterval to retrieve the desired markers. You can also use getGenotype to get all the markers for a particular genotype.

You can register alternative names for a genotype by passing the main_name argument to the registerGenotype function.

* **binf.collection.model_set.** A class designed to hold gene models. Implemented as SQLite with R*tree support to enable fast interval queries.

* **binf.collection.sequence_set.** A class designed to hold several sequences (probably belonging to a single species). Implemented as NetCDF4. Provides fast access to sequences.

* **binf.collection.variant_set.** A class designed to hold the variant positions for a single genotype/sample. Will be re-implemented as SQLite.

collection
----------
Several collections mostly to do with intervals

file_format
-----------
Parsers for several file formats

stats
-----
Some experimental stats modules mostly to do with cumulative stats

test
----
Unit tests. Completely out-of-date...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lhc-python-1.1.2.tar.gz (9.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page