marburg-biobank

Interface code to interact with data from the Ovara.net biobank.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

marburg\_biobank
================

Introduction
------------

The marburg\_biobank python module offers a high level interface to the
data sets stored in the [Ovarian Cancer Effusion Biobank and
Database])(https://www.ovara.net/biobank).

The basic usage is as follows:

.. code:: python

import marburg_biobank
db = marburg_biobank.OvcaBiobank("marburg_ovca_revision_5.zip") # you need to download that file from your biobank.
print(db.list_datasets())
df_wide = db.get_wide('transcriptomics/rnaseq') # to retrieve the data in a one sample per column / one row per measured variable format
df_tall = db.get_dataset('transcriptomics/rnaseq') # to retrieve the data in one row per data point format

Data formats available
----------------------

wide
~~~~

Using ``db.get_wide(dataset)``:

A pandas DataFrame that looks like this

+------------------------+------------------+-----------------+-------------------------+
| Index | Patient12, TAM | Patient12, TU | PatientX, Compartment |
+========================+==================+=================+=========================+
| **VariableA, unitA** | 23.23 | 112.2 | nan |
+------------------------+------------------+-----------------+-------------------------+
| **VariableB, unitB** | 3.23 | 12.2 | 12.7 |
+------------------------+------------------+-----------------+-------------------------+

Caveats: If a dataset has only one compartment, the compartment
information is ommited by get\_wide(), unless
.get\_wide(standardized=True) is used. The same applies for the unit in
the index. If there is a 'name' column in dataset, it get's added to the
index, regardless of the value of standardized.

tall
~~~~

Using: ``db.get_dataset(dataset)``):

A pandas DataFrame that looks like this

+-------------+---------+-------------+---------------+---------+-----------------------+
| variable | unit | patient | compartment | value | optional columns... |
+=============+=========+=============+===============+=========+=======================+
| variableA | unitA | Patient12 | TAM | 23.23 |
+-------------+---------+-------------+---------------+---------+-----------------------+
| variableA | unitA | Patient12 | TU | 112.2 |
+-------------+---------+-------------+---------------+---------+-----------------------+
| variableB | unitB | Patient13 | TAM | 3.23 |
+-------------+---------+-------------+---------------+---------+-----------------------+
| variableB | unitB | Patient13 | TU | 12.2 |
+-------------+---------+-------------+---------------+---------+-----------------------+

This is the internal storage format.

compartments
------------

Compartments are an abstraction on top of 'cells' and 'bio-liquid'.
Examples are Tumor associated macrophages (TAMs), Tumor cells (TU),
ascites, blood... ``db.get_compartments()`` provides a list

Datasets
--------

Datasets are organized two levels deep. The first one defines the
\*omics being measured (transcriptomics, proteomics, ... or 'clinical'),
while the second levels defines the actual method (RNaseq, FACS,...)

Survival data is in clinical/survival. Please remember: if using
`https://pypi.python.org/pypi/lifelines <lifelines>`__, censored and
event are negations of each other.

Excluded patients:
------------------

Patients are excluded from our studies on two levels.

- On global level (for example because their malignancy was not high
grade serous ovarian carcinoma)
- On a per dataset level.

To query what patients are excluded use
``db.get_excluded_patients(dataset)``. Dataset may be an empty string,
in which case you will receive only the globally excluded patients.

``db.get_exclusion_reasons()`` Lists for each patient (and datasets) why
they were excluded.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.156

Feb 23, 2022

0.155

Sep 14, 2021

0.154

May 7, 2021

0.153

May 7, 2021

0.152

May 7, 2021

0.151

Apr 21, 2021

0.150

Apr 21, 2021

0.149

Apr 14, 2021

0.148

Apr 14, 2021

0.147

Jan 29, 2021

0.146

Jan 29, 2021

0.145

Jan 29, 2021

0.144

Jan 29, 2021

0.143

Jan 29, 2021

0.142

Oct 29, 2020

0.141

Oct 28, 2020

0.140

Sep 1, 2020

0.139

Jun 9, 2020

0.138

Jun 9, 2020

0.137

Apr 28, 2020

0.135

Apr 28, 2020

0.134

Apr 22, 2020

0.133

Apr 22, 2020

0.132

Apr 22, 2020

0.131

Mar 19, 2020

0.130

Dec 9, 2019

0.129

Nov 20, 2019

0.128

Nov 20, 2019

0.127

Nov 15, 2019

0.124

Aug 27, 2019

0.122

Aug 26, 2019

0.121

May 29, 2019

0.120

May 29, 2019

0.117

May 3, 2019

0.116

May 3, 2019

0.115

Apr 11, 2018

0.114

Apr 11, 2018

0.113

Jan 9, 2018

0.112

Jan 2, 2018

0.111

Jan 2, 2018

0.109

Jan 2, 2018

0.108

Jan 2, 2018

0.107

Jan 2, 2018

0.106

Jan 2, 2018

0.105

Jan 2, 2018

0.104

Oct 9, 2017

0.103

Sep 12, 2017

0.102

Sep 12, 2017

0.101

Sep 12, 2017

0.11

Jan 2, 2018

This version

0.1

Sep 12, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marburg_biobank-0.1.tar.gz (8.4 kB view hashes)

Uploaded Sep 12, 2017 Source

Built Distribution

marburg_biobank-0.1-py2.py3-none-any.whl (9.8 kB view hashes)

Uploaded Sep 12, 2017 Python 2 Python 3

Hashes for marburg_biobank-0.1.tar.gz

Hashes for marburg_biobank-0.1.tar.gz
Algorithm	Hash digest
SHA256	`24a509b330a3fff6bb9b8fe30f81118dfd4333b7222fce3a543b9877b0990a5a`
MD5	`ba70a05626c1966c9e6ea85c8151555b`
BLAKE2b-256	`447fd4de48605ae82e2dd250b21b7c8ceb7fcafd20f201ba8232a7267665260b`

Hashes for marburg_biobank-0.1-py2.py3-none-any.whl

Hashes for marburg_biobank-0.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`dac16674df44ab7acf63651a13b203ebbbcea36e880b863ae553822c45af7589`
MD5	`89fe7f28b40d8e0731167f59c9f600e5`
BLAKE2b-256	`3a814ba660e2a753449516904396661ff6af2055a5570f22e4263b4e99deba51`