Python interface for tabix
Project description
April 16, 2014
This module allows fast random access to files compressed with bgzip and indexed by tabix. It includes a C extension with code from klib. The bgzip and tabix programs are available here.
Installation
pip install --user pytabix
Synopsis
Genomics data is often in a table where each row corresponds to a genomic region (start, end) or a position:
chrom pos snp 1 1000760 rs75316104 1 1000894 rs114006445 1 1000910 rs79750022 1 1001177 rs4970401 1 1001256 rs78650406
With tabix, you can quickly retrieve all rows in a genomic region by specifying a query with a sequence name, start, and end:
import tabix # Open a remote or local file. url = "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/" url += "ALL.2of4intersection.20100804.genotypes.vcf.gz" tb = tabix.open(url) # These queries are identical. A query returns an iterator over the results. records = tb.query("1", 1000000, 1250000) records = tb.queryi(0, 1000000, 1250000) records = tb.querys("1:1000000-1250000") # Each record is a list of strings. for record in records: print record[:5] break
['1', '1000071', '.', 'C', 'T']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytabix-0.0.2.tar.gz
(46.8 kB
view hashes)