marburg-biobank

Interface code to interact with data from the Ovara.net biobank.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Introduction

The marburg_biobank python module offers a high level interface to the data sets stored in the [Ovarian Cancer Effusion Biobank and Database])(https://www.ovara.net/biobank).

The basic usage is as follows:

import marburg_biobank
db = marburg_biobank.OvcaBiobank("marburg_ovca_revision_5.zip") #  you need to download that file from your biobank.
print(db.list_datasets())
df_wide = db.get_wide('transcriptomics/rnaseq')  # to retrieve the data in a one sample per column / one row per measured variable format
df_tall = db.get_dataset('transcriptomics/rnaseq') # to retrieve the data in one row per data point format

Data formats available

wide

Using db.get_wide(dataset):

A pandas DataFrame that looks like this

Index	Patient12, TAM	Patient12, TU	PatientX, Compartment
VariableA, unitA	23.23	112.2	nan
VariableB, unitB	3.23	12.2	12.7

Caveats: If a dataset has only one compartment, the compartment information is ommited by get_wide(), unless .get_wide(standardized=True) is used. The same applies for the unit in the index. If there is a ‘name’ column in dataset, it get’s added to the index, regardless of the value of standardized.

tall

Using: db.get_dataset(dataset)):

A pandas DataFrame that looks like this

variable	unit	patient	compartment	value
variableA	unitA	Patient12	TAM	23.23
variableA	unitA	Patient12	TU	112.2
variableB	unitB	Patient13	TAM	3.23
variableB	unitB	Patient13	TU	12.2

This is the internal storage format.

compartments

Compartments are an abstraction on top of ‘cells’ and ‘bio-liquid’. Examples are Tumor associated macrophages (TAMs), Tumor cells (TU), ascites, blood… db.get_compartments() provides a list

Datasets

Datasets are organized two levels deep. The first one defines the *omics being measured (transcriptomics, proteomics, … or ‘clinical’), while the second levels defines the actual method (RNaseq, FACS,…)

Survival data is in clinical/survival. Please remember: if using https://pypi.python.org/pypi/lifelines, censored and event are negations of each other.

Excluded patients:

Patients are excluded from our studies on two levels.

On global level (for example because their malignancy was not high grade serous ovarian carcinoma)
On a per dataset level.

To query what patients are excluded use db.get_excluded_patients(dataset). Dataset may be an empty string, in which case you will receive only the globally excluded patients.

db.get_exclusion_reasons() Lists for each patient (and datasets) why they were excluded.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.156

Feb 23, 2022

0.155

Sep 14, 2021

0.154

May 7, 2021

0.153

May 7, 2021

0.152

May 7, 2021

0.151

Apr 21, 2021

0.150

Apr 21, 2021

0.149

Apr 14, 2021

0.148

Apr 14, 2021

0.147

Jan 29, 2021

0.146

Jan 29, 2021

0.145

Jan 29, 2021

0.144

Jan 29, 2021

0.143

Jan 29, 2021

0.142

Oct 29, 2020

0.141

Oct 28, 2020

0.140

Sep 1, 2020

0.139

Jun 9, 2020

0.138

Jun 9, 2020

0.137

Apr 28, 2020

0.135

Apr 28, 2020

0.134

Apr 22, 2020

0.133

Apr 22, 2020

0.132

Apr 22, 2020

0.131

Mar 19, 2020

0.130

Dec 9, 2019

0.129

Nov 20, 2019

0.128

Nov 20, 2019

0.127

Nov 15, 2019

0.124

Aug 27, 2019

0.122

Aug 26, 2019

0.121

May 29, 2019

0.120

May 29, 2019

0.117

May 3, 2019

0.116

May 3, 2019

0.115

Apr 11, 2018

0.114

Apr 11, 2018

0.113

Jan 9, 2018

0.112

Jan 2, 2018

0.111

Jan 2, 2018

0.109

Jan 2, 2018

0.108

Jan 2, 2018

0.107

Jan 2, 2018

0.106

Jan 2, 2018

0.105

Jan 2, 2018

0.104

Oct 9, 2017

0.103

Sep 12, 2017

This version

0.102

Sep 12, 2017

0.101

Sep 12, 2017

0.11

Jan 2, 2018

0.1

Sep 12, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marburg_biobank-0.102.tar.gz (8.5 kB view hashes)

Uploaded Sep 12, 2017 Source

Built Distribution

marburg_biobank-0.102-py2.py3-none-any.whl (9.9 kB view hashes)

Uploaded Sep 12, 2017 Python 2 Python 3

Hashes for marburg_biobank-0.102.tar.gz

Hashes for marburg_biobank-0.102.tar.gz
Algorithm	Hash digest
SHA256	`599148d075b96a4121b4b9a438fb7f581086b6e8cb0a1e0f15a5dd80bb3c04c3`
MD5	`7bb32de987c9141b6431974df4100487`
BLAKE2b-256	`3068def7cd838667075b0921517c0df73b5ce3db634806aaa62366b84abae8c1`

Hashes for marburg_biobank-0.102-py2.py3-none-any.whl

Hashes for marburg_biobank-0.102-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ed4d31f10a0afc52588ccbdd65ce9bad0e4602e00f5a51f4aced36c7b2aec6f`
MD5	`214f8be48f51a0e2ce54b2527d679273`
BLAKE2b-256	`e555001b661a12c6a912a66fb1b1a2cd18a4884987ff06db24a5a6677ee7fc44`