fastqcparser

python API for parsing FastQC output

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

# Welcome to fastqcparser

python API for parsing the output of `FastQC <https://www.bioinformatics.babraham.ac.uk/projects/fastqc/>`.

# Installation

1. Recomended way to install is using ``pip``

```
pip install fastqcparser
```

2. Alternatively you can install with ``easy_install``
::

```
easy_install fastqcparser
```

3. You can also install from Github source code.
::

```
cd
git clone http://bitbucket.org/bubioinformaticshub/fastqcparser.git
cd fastqcparser
python setup.py install
```

# Usage/lazy documentation

```python

# import fastqcparser
from pprint import pprint
from fastqcparser import FastQCParser

# load file
f = FastQCParser('/path/to/fastqc_output_file.txt')

# or
f = FastQCParser('/path/to/fastqc.zip')

# or
with open('/path/to/fastqc_data.txt') as fp :
f = FastQCParser(fp)

# or
with FastQCParser('/path/to/fastqc_output_file.txt') as f :
print(f)

# some convenience fields are available from the Basic Statistics module
print('\n'.join([
f.filename,
f.file_type,
f.encoding,
f.total_sequences,
f.filtered_sequences,
f.sequence_length,
f.percent_gc
]))

# the available modules are in f.modules
pprint(list(f.modules.keys()))

#['Basic Statistics',
# 'Per base sequence quality',
# 'Per sequence quality scores',
# 'Per base sequence content',
# 'Per base GC content',
# 'Per sequence GC content',
# 'Per base N content',
# 'Sequence Length Distribution',
# 'Sequence Duplication Levels',
# 'Overrepresented sequences',
# 'Kmer Content']

# you can access an individual module either as a key of f.modules or using
# f itself:
pprint(f.modules['Basic Statistics'])
pprint(f['Basic Statistics'])

# each module contains a dictionary
pprint(f['Basic Statistics'])

#{'addnl': {},
# 'data': [['Filename', 'sample1.fastq'],
# ['File type', 'Conventional base calls'],
# ['Encoding', 'Sanger / Illumina 1.9'],
# ['Total Sequences', 1571332],
# ['Filtered Sequences', 0],
# ['Sequence length', 29],
# ['%GC', 53]],
# 'fieldnames': ['Measure', 'Value'],
# 'name': 'Basic Statistics',
# 'status': 'pass'}

# 'data' contains the tabular data from the module as a list of lists, with
# numerical values cast to ints and floats as appropriate

# 'fieldnames' contains the names of each column in 'data'

# 'name' is the name of the module, same as the key

# 'status' is pass/warn/fail as reported by fastqc

# 'addnl' contains extra fields for some modules
```

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1

Nov 29, 2018

This version

1.0

Jun 22, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastqcparser-1.0.tar.gz (5.4 kB view hashes)

Uploaded Jun 22, 2018 Source

Built Distribution

fastqcparser-1.0-py3-none-any.whl (8.2 kB view hashes)

Uploaded Jun 22, 2018 Python 3

Hashes for fastqcparser-1.0.tar.gz

Hashes for fastqcparser-1.0.tar.gz
Algorithm	Hash digest
SHA256	`79867730f56e35ed892fcd91183f5e4d55c9fad03ce3c00d55b313c2a08dd8f3`
MD5	`3d047e7a4d6749cb4451b4312ecd4318`
BLAKE2b-256	`8f231b84e093743d2bbfc221fdceb44f6ab1ef7698a5f7517c370de7d37d4432`

Hashes for fastqcparser-1.0-py3-none-any.whl

Hashes for fastqcparser-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c8955b02168007aa8f6e8b6247d5f598c4f38771c2aa8de6f3814164122946b1`
MD5	`c54b26321f54b095a6c623f1764e99fb`
BLAKE2b-256	`f4a0f3774d91062f6292549cba3d3b96ec9eec09572d939b1169588f3da01e4b`