fmrib-unpack

The FMRIB UKBiobank Normalisation, Processing, And Cleaning Kit

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

https://img.shields.io/pypi/v/fmrib-unpack.svg

https://anaconda.org/conda-forge/fmrib-unpack/badges/version.svg

https://zenodo.org/badge/DOI/10.5281/zenodo.1997626.svg

https://git.fmrib.ox.ac.uk/fsl/funpack/badges/master/coverage.svg

FUNPACK is a Python library for pre-processing of UK BioBank data.

FUNPACK is developed at the Wellcome Centre for Integrative Neuroimaging (WIN@FMRIB), University of Oxford. FUNPACK is in no way endorsed, sanctioned, or validated by the UK BioBank.

FUNPACK comes bundled with metadata about the variables present in UK BioBank data sets. This metadata can be obtained from the UK BioBank online data showcase

Installation

Install FUNPACK via pip:

pip install fmrib-unpack

Or from conda-forge:

conda install -c conda-forge fmrib-unpack

Introductory notebook

The funpack_demo command will start a Jupyter Notebook which introduces the main features provided by FUNPACK. To run it, you need to install a few additional dependencies:

pip install fmrib-unpack[demo]

You can then start the demo by running funpack_demo.

Usage

General usage is as follows:

funpack [options] output.tsv input1.tsv input2.tsv

You can get information on all of the options by typing funpack --help.

Options can be specified on the command line, and/or stored in a configuration file. For example, the options in the following command line:

funpack \
  --overwrite \
  --import_all \
  --log_file log.txt \
  --icd10_map_file icd_codes.tsv \
  --category 10 \
  --category 11 \
  output.tsv input1.tsv input2.tsv

Could be stored in a configuration file config.txt:

overwrite
import_all
log_file       log.txt
icd10_map_file icd_codes.tsv
category       10
category       11

And then executed as follows:

funpack -cfg config.txt output.tsv input1.tsv input2.tsv

Customising

FUNPACK contains a large number of built-in rules which have been specifically written to pre-process UK BioBank data variables. These rules are stored in the following files:

funpack/data/variables_*.tsv: Cleaning rules for individual variables

funpack/data/datacodings_*.tsv: Cleaning rules for data codings

funpack/data/types.tsv: Cleaning rules for specific types

funpack/data/processing.tsv: Processing steps

You can customise or replace these files as you see fit. You can also pass your own versions of these files to FUNPACK via the --variable_file, --datacoding_file, --type_file and --processing_file command-line options respectively. FUNPACK will load all variable and datacoding files, and merge them into a single table which contains the cleaning rules for each variable.

Finally, you can use the --no_builtins option to bypass all of the built-in cleaning and processing rules.

Output

The main output of FUNPACK is a plain-text tab-delimited[*]_ file which contains the input data, after cleaning and processing, potentially with some columns removed, and new columns added.

If you used the --non_numeric_file option, the main output file will only contain the numeric columns; non-numeric columns will be saved to a separate file.

You can use any tool of your choice to load this output file, such as Python, MATLAB, or Excel. It is also possible to pass the output back into FUNPACK.

Loading output into MATLAB

If you are using MATLAB, you have several options for loading the FUNPACK output. The best option is readtable, which will load column names, and will handle both non-numeric data and missing values. Use readtable like so:

data = readtable('out.tsv', 'FileType', 'text');

The readtable function returns a table object, which stores each column as a separate vector (or cell-array for non-numeric columns). If you are only interested in numeric columns, you can retrieve them as an array like this:

data    = data(:, vartype('numeric'));
rawdata = data.Variables;

The readtable function will potentially rename the column names to ensure that they are are valid MATLAB identifiers. You can retrieve the original names from the table object like so:

colnames        = data.Properties.VariableDescriptions;
colnames        = regexp(colnames, '''(.+)''', 'tokens', 'once');
empty           = cellfun(@isempty, colnames);
colnames(empty) = data.Properties.VariableNames(empty);
colnames        = vertcat(colnames{:});

If you have used the --description_file option, you can load in the descriptions for each column as follows:

descs = readtable('descriptions.tsv', ...
                  'FileType', 'text', ...
                  'Delimiter', '\t',  ...
                  'ReadVariableNames',false);
descs = [descs; {'eid', 'ID'}];
idxs  = cellfun(@(x) find(strcmp(descs.Var1, x)), colnames, ...
                'UniformOutput', false);
idxs  = cell2mat(idxs);
descs = descs.Var2(idxs);

Tests

To run the test suite, you need to install some additional dependencies:

pip install fmrib-unpack[test]

Then you can run the test suite using pytest:

pytest

Citing

If you would like to cite FUNPACK, please refer to its Zenodo page.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.8.0

Dec 13, 2023

3.7.1

Sep 22, 2023

3.7.0

Apr 17, 2023

3.6.0

Feb 3, 2023

3.5.2

Aug 18, 2022

3.5.0

Aug 5, 2022

3.4.0

Jul 29, 2022

3.3.1

Jun 28, 2022

3.3.0

Jun 27, 2022

3.2.3

Jun 2, 2022

3.2.2

May 31, 2022

3.2.1

May 31, 2022

3.2.0

May 13, 2022

3.1.0

May 6, 2022

3.0.0

Jan 5, 2022

2.9.1

Dec 29, 2021

2.9.0

Dec 28, 2021

2.8.0

Aug 19, 2021

2.7.1

Jun 22, 2021

2.7.0

May 14, 2021

2.6.0

Mar 29, 2021

2.5.2

Mar 15, 2021

2.5.1

Mar 3, 2021

2.5.0

Dec 9, 2020

2.4.0

Nov 27, 2020

2.3.3

Oct 5, 2020

2.3.2

Jun 10, 2020

2.3.1

May 27, 2020

2.3.0

May 13, 2020

2.1.0

Apr 22, 2020

2.0.0

Apr 7, 2020

1.9.0

Feb 28, 2020

1.8.2

Feb 27, 2020

1.8.1

Feb 19, 2020

1.8.0

Feb 18, 2020

1.7.1

Jan 30, 2020

1.7.0

Jan 24, 2020

1.6.0

Dec 12, 2019

1.5.0

Dec 9, 2019

1.4.5

Dec 5, 2019

1.4.2

Oct 22, 2019

1.4.1

Jul 8, 2019

1.4.0

Jul 7, 2019

1.3.2

Jun 4, 2019

1.3.1

May 30, 2019

1.3.0

May 29, 2019

1.2.1

May 28, 2019

1.2.0

May 25, 2019

1.1.4

May 17, 2019

1.1.3

May 17, 2019

1.1.2

May 16, 2019

1.1.0

May 14, 2019

This version

1.0.2

May 14, 2019

1.0.1

May 10, 2019

1.0.0

May 10, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fmrib-unpack-1.0.2.tar.gz (1.5 MB view hashes)

Uploaded May 14, 2019 Source

Built Distribution

fmrib_unpack-1.0.2-py3-none-any.whl (1.5 MB view hashes)

Uploaded May 14, 2019 Python 3

Hashes for fmrib-unpack-1.0.2.tar.gz

Hashes for fmrib-unpack-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`66d2b39e370a4ea6e1cde0a2b941bb80faf96104807eee5ceaf5e9cd9e05e584`
MD5	`ad9a1edef5e18e20e7258e5c93d82367`
BLAKE2b-256	`d36152d2aa003eb32fff4f549a2e01505c438d869a44f5906c951a60cac2f9c4`

Hashes for fmrib_unpack-1.0.2-py3-none-any.whl

Hashes for fmrib_unpack-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0e3da9c836f0cd0ebdf409b5f1d358bc2f1c4eae78122dc8dcec09c817181dc3`
MD5	`73fece4a85c46eba7282acc7cdb38889`
BLAKE2b-256	`0983b54c76c52441b057cea71fa3b73384a0586c0ffabb3bbcb6c1ef18afa2d4`