Skip to main content

Scikit-learn Wrapper for Regularized Greedy Forest

Project description

Build Status Travis Build Status AppVeyor License Python Versions PyPI Version

rgf_python

The wrapper of machine learning algorithm Regularized Greedy Forest (RGF) [1] for Python.

Features

Scikit-learn interface and possibility of usage for multiclass classification problem.

Original RGF implementation is available only for regression and binary classification, but rgf_python is also available for multiclass classification by “One-vs-Rest” method.

FastRGF (alpha version) is supported. Please see this guide.

Example:

from sklearn import datasets
from sklearn.utils.validation import check_random_state
from sklearn.model_selection import StratifiedKFold, cross_val_score
from rgf.sklearn import RGFClassifier

iris = datasets.load_iris()
rng = check_random_state(0)
perm = rng.permutation(iris.target.size)
iris.data = iris.data[perm]
iris.target = iris.target[perm]

rgf = RGFClassifier(max_leaf=400,
                    algorithm="RGF_Sib",
                    test_interval=100,
                    verbose=True)

n_folds = 3

rgf_scores = cross_val_score(rgf,
                             iris.data,
                             iris.target,
                             cv=StratifiedKFold(n_folds))

rgf_score = sum(rgf_scores)/n_folds
print('RGF Classfier score: {0:.5f}'.format(rgf_score))

More examples could be found here.

Software Requirements

  • Python (2.7 or >= 3.4)

  • scikit-learn (>= 0.18)

Installation

From PyPI using pip:

pip install rgf_python

or from GitHub:

git clone https://github.com/fukatani/rgf_python.git
cd rgf_python
python setup.py install

If you have any problems while installing by methods listed above, you should build RGF executable file from binaries by your own and place compiled executable file into directory which is included in environmental variable ‘PATH’ or into directory with installed package. Alternatively, you may specify actual location of RGF executable file and directory for placing temp files by corresponding flags in configuration file .rgfrc, which you should create into your home directory. The default values are platform dependent: for Windows exe_location=$HOME/rgf.exe, temp_location=$HOME/temp/rgf and for others exe_location=$HOME/rgf, temp_location=/tmp/rgf. Here is the example of .rgfrc file:

exe_location=C:/Program Files/RGF/bin/rgf.exe
temp_location=C:/Program Files/RGF/temp

Also, you may directly specify installation without automatic compilation:

pip install rgf_python --install-option=--nocompilation

or

git clone https://github.com/fukatani/rgf_python.git
cd rgf_python
python setup.py install --nocompilation

sudo (or administrator privileges in Windows) may be needed to perform commands.

Here is the guide how you can build RGF executable file from binaries. The file will be in rgf_python/include/rgf/bin folder.

Windows

Precompiled file

The easiest way. Just take precompiled file from rgf_python/include/rgf/bin. For Windows 32-bit rename rgf32.exe to rgf.exe and take it.

Visual Studio (existing solution)
  1. Open directory rgf_python/include/rgf/Windows/rgf.

  2. Open rgf.sln file with Visual Studio and choose BUILD->Build Solution (Ctrl+Shift+B). If you are asked to upgrade solution file after opening it click OK. If you have errors about Platform Toolset go to PROJECT-> Properties-> Configuration Properties-> General and select the toolset installed on your machine.

MinGW (existing makefile)

Build executable file with MinGW g++ from existing makefile (you may want to customize this file for your environment).

cd rgf_python/include/rgf/build
mingw32-make
CMake and Visual Studio

Create solution file with CMake and then compile with Visual Studio.

cd rgf_python/include/rgf/build
cmake ../ -G "Visual Studio 10 2010"
cmake --build . --config Release

If you are compiling on 64-bit machine then add Win64 to the end of generator’s name: Visual Studio 10 2010 Win64. We tested following versions of Visual Studio:

  • Visual Studio 10 2010 [Win64]

  • Visual Studio 11 2012 [Win64]

  • Visual Studio 12 2013 [Win64]

  • Visual Studio 14 2015 [Win64]

  • Visual Studio 15 2017 [Win64]

Other versions may work but are untested.

CMake and MinGW

Create makefile with CMake and then compile with MinGW.

cd rgf_python/include/rgf/build
cmake ../ -G "MinGW Makefiles"
cmake --build . --config Release

*nix

g++ (existing makefile)

Build executable file with g++ from existing makefile (you may want to customize this file for your environment).

cd rgf_python/include/rgf/build
make
CMake

Create makefile with CMake and then compile.

cd rgf_python/include/rgf/build
cmake ../
cmake --build . --config Release
Docker image

We provide rgf_python installed docker image.

# Run docker image
docker run -it fukatani/rgf_python /bin/bash
# Run RGF example
python ./rgf_python/examples/comparison_RGF_and_RF_regressors_on_boston_dataset.py
# Run FastRGF Example
python ./rgf_python/examples/fast_rgf/FastRGF_classifier_on_iris_dataset.py

Tuning Hyper-parameters

You can tune hyper-parameters as follows.

  • max_leaf: Appropriate values are data-dependent and usually varied from 1000 to 10000.

  • test_interval: For efficiency, it must be either multiple or divisor of 100 (default value of the optimization interval).

  • algorithm: You can select “RGF”, “RGF Opt” or “RGF Sib”.

  • loss: You can select “LS”, “Log” or “Expo”.

  • reg_depth: Must be no smaller than 1. Meant for being used with algorithm = “RGF Opt” or “RGF Sib”.

  • l2: Either 1, 0.1, or 0.01 often produces good results though with exponential loss (loss = “Expo”) and logistic loss (loss = “Log”), some data requires smaller values such as 1e-10 or 1e-20.

  • sl2: Default value is equal to l2. On some data, l2/100 works well.

  • normalize: If turned on, training targets are normalized so that the average becomes zero.

  • min_samples_leaf: Smaller values may slow down training. Too large values may degrade model accuracy.

  • n_iter: Number of iterations of coordinate descent to optimize weights.

  • n_tree_search: Number of trees to be searched for the nodes to split. The most recently grown trees are searched first.

  • opt_interval: Weight optimization interval in terms of the number of leaf nodes.

  • learning_rate: Step size of Newton updates used in coordinate descent to optimize weights.

Detailed instruction of tuning hyper-parameters is here.

Using at Kaggle Kernel

Now, Kaggle Kernel supports rgf_python. Please see this page.

Troubleshooting

  • rgf_python raised error while fitting or predicting.

First, please try to run test.py and confirm install successfully.

If you succeeded test, these pages may help you:

  1. https://github.com/fukatani/rgf_python/issues/13 (Datasets including string)

  2. https://github.com/fukatani/rgf_python/issues/75 (Temp file capacity is over in kaggle kernel)

If you can’t solve your problem, feel free to open new issue.

License

rgf_python is distributed under the GNU General Public License v3 (GPLv3). Please read file LICENSE for more information.

rgf_python includes RGF version 1.2 which is distributed under the GPLv3. Original CLI implementation of RGF you can download at http://tongzhang-ml.org/software/rgf.

rgf_python includes FastRGF version 0.5 which is distributed under the MIT license. Original CLI implementation of FastRGF you can download at https://github.com/baidu/fast_rgf.

Many thanks to Rie Johnson and Tong Zhang (the authors of RGF).

Other

Shamelessly, much part of the implementation is based on the following code. Thanks!

Reference

[1] Rie Johnson and Tong Zhang, Learning Nonlinear Functions Using Regularized Greedy Forest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rgf_python-2.3.0.tar.gz (213.1 kB view hashes)

Uploaded Source

Built Distributions

rgf_python-2.3.0-py2.py3-none-win_amd64.whl (825.8 kB view hashes)

Uploaded Python 2 Python 3 Windows x86-64

rgf_python-2.3.0-py2.py3-none-win32.whl (759.5 kB view hashes)

Uploaded Python 2 Python 3 Windows x86

rgf_python-2.3.0-py2.py3-none-manylinux1_x86_64.whl (759.4 kB view hashes)

Uploaded Python 2 Python 3

rgf_python-2.3.0-py2.py3-none-manylinux1_i686.whl (1.2 MB view hashes)

Uploaded Python 2 Python 3

rgf_python-2.3.0-py2.py3-none-macosx_10_6_x86_64.macosx_10_7_x86_64.macosx_10_8_x86_64.macosx_10_9_x86_64.macosx_10_10_x86_64.macosx_10_11_x86_64.macosx_10_12_x86_64.macosx_10_13_x86_64.whl (722.7 kB view hashes)

Uploaded Python 2 Python 3 macOS 10.10+ x86-64 macOS 10.11+ x86-64 macOS 10.12+ x86-64 macOS 10.13+ x86-64 macOS 10.6+ x86-64 macOS 10.7+ x86-64 macOS 10.8+ x86-64 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page