Skip to main content

Modern decision trees in Python

Project description

Code style: black CircleCI Main Checked with mypy codecov

scikit-tree

scikit-tree is a scikit-learn compatible API for building state-of-the-art decision trees. These include unsupervised trees, oblique trees, uncertainty trees, quantile trees and causal trees.

Tree-models have withstood the test of time, and are consistently used for modern-day data science and machine learning applications. They especially perform well when there are limited samples for a problem and are flexible learners that can be applied to a wide variety of different settings, such as tabular, images, time-series, genomics, EEG data and more.

We welcome contributions for modern tree-based algorithms. We use Cython to achieve fast C/C++ speeds, while abiding by a scikit-learn compatible (tested) API. Moreover, our Cython internals are easily extensible because they follow the internal Cython API of scikit-learn as well.

Submodule dependency on a fork of scikit-learn Due to the current state of scikit-learn's internal Cython code for trees, we have to instead leverage a maintained fork of scikit-learn at https://github.com/neurodata/scikit-learn, where specifically, the fork branch is used to build and install this repo. We keep that fork well-maintained and up-to-date with respect to the main sklearn repo. The only difference is the refactoring of the tree/ submodule. This fork is used internally under the namespace sktree._lib.sklearn. It is necessary to use this fork for anything related to:

  • RandomForest*
  • ExtraTrees*
  • or any importable items from the tree/ submodule, whether it is a Cython or Python object

If you are developing for scikit-tree, we will always depend on the most up-to-date commit of https://github.com/neurodata/scikit-learn/submodulev2 as a submodule within scikit-tee. This branch is consistently maintained for changes upstream that occur in the scikit-learn tree submodule. This ensures that our fork maintains consistency and robustness due to bug fixes and improvements upstream.

Documentation

See here for the documentation for our dev version: https://docs.neurodata.io/scikit-tree/dev/index.html

Why oblique trees and why trees beyond those in scikit-learn?

In 2001, Leo Breiman proposed two types of Random Forests. One was known as Forest-RI, which is the axis-aligned traditional random forest. One was known as Forest-RC, which is the random oblique linear combinations random forest. This leveraged random combinations of features to perform splits. MORF builds upon Forest-RC by proposing additional functions to combine features. Other modern tree variants such as Canonical Correlation Forests (CCF), or unsupervised random forests are also important at solving real-world problems using robust decision tree models.

Installation

Our installation will try to follow scikit-learn installation as close as possible, as we contain Cython code subclassed, or inspired by the scikit-learn tree submodule.

AS OF NOW, scikit-tree is in development stage and the installation is still finicky due to the upstream scikit-learn's stalled refactoring PRs of the tree submodule. Once those are merged, the installation will be simpler. The current recommended installation is done locally with meson.

Dependencies

We minimally require:

* Python (>=3.8)
* numpy
* scipy
* scikit-learn >= 1.3

Installation with Pip

pip install sktree

Building locally with Meson (RECOMMENDED)

Make sure you have the necessary packages installed

# install build dependencies
pip install numpy scipy meson ninja meson-python Cython scikit-learn scikit-learn-tree

# you may need these optional dependencies to build scikit-learn locally
conda install -c conda-forge joblib threadpoolctl pytest compilers llvm-openmp

We use the spin CLI to abstract away build details:

# run the build using Meson/Ninja
./spin build

# you can run the following command to see what other options there are
./spin --help
./spin build --help

# For example, you might want to start from a clean build
./spin build --clean

# or build in parallel for faster builds
./spin build -j 2

# you will need to double check the build-install has the proper path 
# this might be different from machine to machine
export PYTHONPATH=${PWD}/build-install/usr/lib/python3.9/site-packages

# run specific unit tests
./spin test -- sktree/tree/tests/test_tree.py

# you can bring up the CLI menu
./spin --help

You can also do the same thing using Meson/Ninja itself. Run the following to build the local files:

# generate ninja make files
meson build --prefix=$PWD/build

# compile
ninja -C build

# install scikit-tree package
meson install -C build

export PYTHONPATH=${PWD}/build/lib/python3.9/site-packages

# to check installation, you need to be in a different directory
cd docs;  
python -c "from sktree import tree"
python -c "import sklearn; print(sklearn.__version__);"

Alternatively, you can use editable installs

pip install --no-build-isolation --editable .

References

[1]: Li, Adam, et al. "Manifold Oblique Random Forests: Towards Closing the Gap on Convolutional Deep Networks." arXiv preprint arXiv:1909.11799 (2019)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_tree-0.1.4.tar.gz (13.8 MB view hashes)

Uploaded Source

Built Distributions

scikit_tree-0.1.4-cp311-cp311-win_amd64.whl (12.4 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

scikit_tree-0.1.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

scikit_tree-0.1.4-cp311-cp311-macosx_11_0_arm64.whl (1.3 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

scikit_tree-0.1.4-cp311-cp311-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

scikit_tree-0.1.4-cp310-cp310-win_amd64.whl (12.4 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

scikit_tree-0.1.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

scikit_tree-0.1.4-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

scikit_tree-0.1.4-cp310-cp310-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

scikit_tree-0.1.4-cp39-cp39-win_amd64.whl (12.5 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

scikit_tree-0.1.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

scikit_tree-0.1.4-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

scikit_tree-0.1.4-cp39-cp39-macosx_10_9_x86_64.whl (1.5 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page