Python package to run Machine Learning Experiments, within the Hive Framework.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Hive-ML

Hive-ML is a Python Package collecting the tools and scripts to run Machine Learning experiments on Radiological Medical Imaging.

Install

To install Hive-ML:

pip install hive-ml

or from GitHub:

git clone 
pip install -e Hive_ML

Description

The Hive-ML workflow consists of several sequential steps, including Radiomics extraction, Sequential Forward Feature Selection, and Model Fitting, reporting the classifier performances ( ROC-AUC, Sensitivity, Specificity, Accuracy) in a tabular format and tracking all the steps on an MLFlow server.

In addition, Hive-ML provides a Docker Image, Kubernetes Deployment and Slurm Job, with the corresponding set of instructions to easily reproduce the experiments.

Finally, Hive-ML also support model serving through MLFlow, to provide easy access to the trained classifier for future usage in model prediction.

#In the tutorial explained below, Hive-ML is used to predict the Pathological Complete Response after a Neo-Adjuvant #chemotherapy, from DCE-MRI.

Usage

Hive-ML Pipeline The Hive-ML workflow is controlled from a JSON configuration file, which the user can customize for each experiment run.

Example:

    {
      "image_suffix": "_image.nii.gz",  # File suffix (or list of File suffixes) of the files containing the image volume.
      "mask_suffix": "_mask.nii.gz",    # File suffix (including file extension) of the files containing the segmentation mask of the ROI.
      "label_dict": {                   # Dictionary describing the classes. The key-value pair contains the label value as key (starting from 0) and the class description as value.
        "0": "non-pCR",
        "1": "pCR"
      },
      "models": {                       # Dictionary for all the classifiers to evaluate. Each element includes the classifier class name and an additional dictionary with the kwargs to pass to the classifier object.
        "rf": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "adab": {
          "criterion": "gini",
          "n_estimators": 100,
          "max_depth": 10
        },
        "knn": {},
        "lda": {},
        "qda": {},
        "logistic_regression": {},
        "svm": {
          "kernel": "rbf"
        },
        "naive": {}
      },
      "perfusion_maps": {                # Dictionary describing the perfusion maps to extract. Each element includes the perfusion map name and the file suffix used to save the perfusion map.
        "distance_map": "_distance_map.nii.gz",
        "distance_map_depth": {
          "suffix": "_distance_map_depth.nii.gz",
          "kwargs": [
            2
          ]
        },
        "ttp": "_ttp_map.nii.gz",
        "cbv": "_cbv_map.nii.gz",
        "cbf": "_cbf_map.nii.gz",
        "mtt": "_mtt_map.nii.gz"
     },
      "feature_selection": "SFFS",       # Type of Feature Selection to perform. Supported values are SFFS and PCA .
      "n_features": 30,                  # Number of features to preserve when performing Feature Selection.
      "n_folds": 5,                      # Number of folds to run cross-validation.
      "random_seed": 12345,              # Random seed number used when randomizing events and actions.
      "feature_aggregator": "SD"         # Aggregation strategy used when extracting features in the 4D. 
                                         # Supported values are: ``Flat`` (no aggregation, all features are preserved),
                                         #                       ``Mean`` (Average over the 4-th dimension),
                                         #                        ``SD`` (Standard Deviation over the 4-th dimension),
                                         #                        ``Mean_Norm`` (Independent channel-normalization, followed by average over the 4-th dimension),
                                         #                        ``SD_Norm`` (Independent channel-normalization, followed by SD over the 4-th dimension)
      "k_ensemble": [1,5],               # List of k values to select top-k best models in ensembling.
      "metric_best_model": "roc_auc",    # Classification Metric to consider when determining the best models from CV results.
      "reduction_best_model": "mean"     # Reduction to perform on CV scores to determine the best models.
    }

Perfusion Maps Generation

Given a 4D Volume, to extract the perfusion maps (TTP, CBV, CBF, MTT) run:

 Hive_ML_generate_perfusion_maps -i </path/to/data_folder> --config-file <config_file.json>

Fore more details, follow the Jupyter Notebook Tutorial : Generate Perfusion Maps

Perfusion Curve Perfusion Maps

Feature Extraction

To extract Radiomics/Radiodynamics from the 4D Volume, run:

 Hive_ML_extract_radiomics --data-folder </path/to/data_folder> --config-file <config_file.json> --feature-param-file </path/to/radiomics_config.yaml --output-file </path/to/feature_file>

Feature Extraction

Fore more details, follow the Jupyter Notebook Tutorial : Extract Features

Feature Selection

To run Feature Selection:

 Hive_ML_feature_selection --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>

The Feature Selection report (in JSON format, including the selected features and validation scores for each classifier) will be available at the following path:

$ROOT_FOLDER/<EXPERIMENT_ID>/SFFS

Feature Selection

Fore more details, follow the Jupyter Notebook Tutorial : Feature Selection

Model Fitting

To perform Model Fitting on the Selected features:

 Hive_ML_model_fitting --feature-file </path/to/feature_file> --config-file <config_file.json> --experiment-name <EXPERIMENT_ID>

The experiment validation reports, plots, and summaries will be available at the following path:

$ROOT_FOLDER/<EXPERIMENT_ID>

Validation Plot Example

Fore more details, follow the Jupyter Notebook Tutorial : Model Fitting

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

1.0.1

Jul 25, 2023

1.0

Jul 25, 2023

1.0a8 pre-release

Jul 25, 2023

1.0a0.post7 pre-release

Jul 25, 2023

1.0a0.post6 pre-release

Jul 25, 2023

1.0a0.post5 pre-release

Jul 25, 2023

1.0a0.post4 pre-release

Jul 25, 2023

1.0a0.post3 pre-release

Jul 25, 2023

1.0a0.post2 pre-release

Jul 25, 2023

1.0a0.post1 pre-release

Jul 25, 2023

1.0a0 pre-release

Jul 21, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Hive_ML-1.0.1.tar.gz (48.1 kB view hashes)

Uploaded Jul 25, 2023 Source

Built Distribution

Hive_ML-1.0.1-py3-none-any.whl (34.4 kB view hashes)

Uploaded Jul 25, 2023 Python 3

Hashes for Hive_ML-1.0.1.tar.gz

Hashes for Hive_ML-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`91756a8945dcaf5f8d3c60dc67bbb6300d7e1624270dabcccdab3d3b95fdd4f4`
MD5	`67510ec5c6d9661fdd69a62d7cd07f83`
BLAKE2b-256	`c1275f8eb46ad63911dd5bf092b16d1c80732d1f75a0d5be6c0c82b8cec5e573`

Hashes for Hive_ML-1.0.1-py3-none-any.whl

Hashes for Hive_ML-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`315deea6876a435f4e4ebe6238eb60dd33f087295eddee226c783f65147fd417`
MD5	`eb5fe6a34cb1047e0682bc63e240dad3`
BLAKE2b-256	`7ea36e0b8fca61d58bc02cde31a1f806fc9a1c084cc600d880c80fb002e56590`