Skip to main content

Neuroimaging Predictive Analysis

Project description

# neuropredict

Automatic estimation of predictive power of commonly used neuroimaging features as well as user-defined features.

The aim of this python module would be to automatically assess the predictive power of commonly used neuroimaging features (such as resting-state connectivity, fractional anisotropy, subcortical volumes and cortical thickness features) automatically read from the processing of popular tools such as FSL, DTIstudio, AFNI and Freesurfer, and present a comprehensive report on a given dataset. It is mainly aimed (to lower or remove the barriers) at clinical users who would like to understand what features and brain regions are discriminative in their shiny new dataset before diving into the deep grey sea of feature extraction and optimization.

PS: It sounds similar (on the surface) to other software available, however it is aimed to lower the barriers even further, or remove them altogether! All the user would need to provide are commonly used features (such as a Freesurfer output directory) and obtain an easy to read report (see below), along with well-packaged export of performance metrics (for sharing and posthoc comparison) on the predictive power of the features they are interested in.

![composite](docs/composite_flyer.001.png)

**Table of Contents**

- [neuropredict](#)
- [FAQ](#faq)
- [Context](#context)
- [Predictive analysis](#predictive-analysis)
- [Report](#report)
- [Input Features](#input-features)
- [Arbitray feature input](#arbitray-feature-input)
- [Automatic readers currently supported](#automatic-readers-currently-supported)
- [Automatic readers in development (stay tuned)](#automatic-readers-in-development-stay-tuned)
- [Installation:](#installation)
- [Usage:](#usage)
- [Dependencies](#dependencies)

## FAQ

Refer to ![FAQ](FAQ.md)

## Context

Imagine you have just acquired a wonderful new dataset with certain number of diseased patients and healthy controls. In the case of T1 mri analysis, you typically start by preprocessing it wih your favourite software (such as Freesurfer), which produces a ton of segmentations and statistics within them (such as their volumes and cortical thickness). Typical scenario would be to examine group differences (e.g. between controls and disease_one or between controls and other_disease), find the most discriminative variables and/or their brain regions and report how they relate to know cognitive or neuropsychological measures. This analysis and the resulting insights is necessary and informs us better of the dataset. However, that's not the fullest extent of the analysis one could perform, as association studies do not inform us of the predictive utility of the aforementioned discriminative variables or regions, which needs to independently investigated.

## Predictive analysis
Conducting a machine learning study (to assess the predictive utility of different regions, features or methods) is not trivial. In the simplest case, it requires one to understand standard techniques, learn one or two toolboxes and do the complex programming necessary to interface their data with ML toolbox (even with the help of well-written packages like nilearn that are meant for neuroimaging analysis). In addition, in order to properly evaluate the performance, the user needs to have a good grasp of the best practices in machine learning. Even if the user could produce certain numbers out of a black-box toolboxes, some more programming is necessary to make sense of the results and procude necessary plots for publications.

## Report
Neuropredict is here to remove those barriers and make your life easier!

All you need to do is take care of preprocessing and produce quality controlled output through popular software, and neuropredict will produce a comprehensive report (see figures below) of distribtions of cross-validated performance, confusion matrices, analysis into misclassification and an intuitive comparison across multiple features.

## Example
For example, if you have a dataset with the following three classes: 5 controls, 6 disease_one and 9 other_disease, all you would need to do is produce a meta data file as shown below (specifying a class label for each subject):

```
3071,controls
3069,controls
3064,controls
3063,controls
3057,controls
5004,disease_one
5074,disease_one
5077,disease_one
5001,disease_one
5002,disease_one
5003,disease_one
5000,other_disease
5006,other_disease
5013,other_disease
5014,other_disease
5016,other_disease
5018,other_disease
5019,other_disease
5021,other_disease
5022,other_disease
```


and `neuropredict` will produce the figures (and numbers in a CSV files) as shown here:

![composite](docs/composite_flyer.001.png)

The higher resolution PDFs are included in the [docs](docs) folder.

I hope this user-friendly tool would help you get started on the predictive analysis you've been wanting to do for a while.

# Input Features

neuropredict is aimed at interfacing with popular feature extraction algorithms such as Freesurfer, FSL and others directly - see *Readers* section below. However, it allows an arbitray input of features that have already been extracted via user's own pipeline(s).

## Arbitray feature input
For custom input:
* the user needs to save the features in a single folder for all subjects (let's call it /project/myawsomepipeline )
* specify it with --userdefined /project/myawsomepipeline
* within which, features for each subject in a separate folder (named after its id specified in the meta data file)
* in a file called `features.txt`. The `features.txt` file must contain a single floating point number per line (see below - its not comma separated),
* and all the subject features must have an equal number of features.

Then neuropredict will automatically consolidate the features into its native [`pyradigm` MLdataset format](github.com/raamana/pyradigm), ideally suited for the predictive analysis tasks.

The example for a dataset with 2 controls and 2 disease sujects with 5 features each is shown below:
```
$ 11:19:22 linux userdefined >> ls -1
control-001
control-002
disease-003
disease-004
$ 11:19:30 linux userdefined >> tree
.
|-- control-001
| `-- features.txt
|-- control-002
| `-- features.txt
|-- disease-003
| `-- features.txt
`-- disease-004
`-- features.txt

4 directories, 4 files
$ 11:19:33 linux userdefined >> head -n 5 */features.txt
==> control-001/features.txt <==
0.868896136902
0.542305564899
0.115903893374
0.503297862357
0.564961631104

==> control-002/features.txt <==
0.868896136902
0.542305564899
0.115903893374
0.503297862357
0.564961631104

==> disease-003/features.txt <==
0.868896136902
0.542305564899
0.115903893374
0.503297862357
0.564961631104

==> disease-004/features.txt <==
0.868896136902
0.542305564899
0.115903893374
0.503297862357
0.564961631104
```

## Automatic readers currently supported
* Freesurfer
* Subcortical volumes
* Wholebrain Aseg stats

## Automatic readers in development (stay tuned)
* Freesurfer
* cortical thickness
* gray matter density
* structural covariance
* Any nibabel-readable data
* DT-MRI features
* task-free fMRI features
* HCP datasets
* Weka's ARFF format

# Installation

neuropredict could be easily installed by issuing the following command:
```bash
pip install -U neuropredict
```

If `pip` throws an error, re-run the above command few times, most errors usually get resolved.

Installing it with admin privileges is the recommended way. However, if you do not have admin privileges, try this:
```
pip install -U neuropredict --user
```

However, you may need to add the location of binary files to your path by adding this command to your login script:
```
export PATH=$PATH:~/.local/bin/
```

# Usage:

```
usage: neuropredict [-h] -m METADATAFILE -o OUTDIR [-f FSDIR]
[-u USERDIR [USERDIR ...]] [-p POSITIVECLASS]
[-t TRAIN_PERC] [-n NUM_REP_CV] [-a ATLASID]

optional arguments:
-h, --help show this help message and exit
-m METADATAFILE, --metadatafile METADATAFILE
Abs path to file containing metadata for subjects to
be included for analysis. At the minimum, each subject
should have an id per row followed by the class it
belongs to. E.g. sub001,control sub002,control
sub003,disease sub004,disease
-o OUTDIR, --outdir OUTDIR
Output folder to store features and results.
-f FSDIR, --fsdir FSDIR
Absolute path to SUBJECTS_DIR containing the finished
runs of Freesurfer parcellation (each subject named
after its ID in the metadata file)
-u USERDIR [USERDIR ...], --userdir USERDIR [USERDIR ...]
List of absolute paths to an user's own features.Each
folder contains a separate folder for each subject
(named after its ID in the metadata file) containing a
file called features.txt with one number per line. All
the subjects (in a given folder) must have the number
of features (#lines in file). Different folders can
have different number of features for each subject.
Names of each folder is used to annotate the results
in visualizations. Hence name them uniquely and
meaningfully, keeping in mind these figures will be
included in your papers.
-p POSITIVECLASS, --positiveclass POSITIVECLASS
Name of the positive class (Alzheimers, MCI or
Parkinsons etc) to be used in calculation of area
under the ROC curve. Default: class appearning second
in order specified in metadata file.
-t TRAIN_PERC, --trainperc TRAIN_PERC
Percentage of the smallest class to be reserved for
training. Must be in the interval [0.01 0.99].If
sample size is sufficiently big, we recommend 0.5.If
sample size is small, or class imbalance is high,
choose 0.8.
-n NUM_REP_CV, --numrep NUM_REP_CV
Number of repetitions of the repeated-holdout cross-
validation. The larger the number, the better the
estimates will be.
-a ATLASID, --atlas ATLASID
Name of the atlas to use for visualization. Default:
fsaverage, if available.
```

# Dependencies
* numpy
* scikit-learn
* pyradigm
* nibabel

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neuropredict-0.2.4.5.tar.gz (26.1 kB view hashes)

Uploaded Source

Built Distribution

neuropredict-0.2.4.5-py2-none-any.whl (35.7 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page