Skip to main content

Extreme multiclass and multi-label classification

Project description

myriad

Multiclass classification with tens of thousands of classes

Usage

Datasets

Name Function Size Samples Features Labels Multi-label Labels/sample
DMOZ load_dmoz 614,8 MB 394,756 833,484 36,372 1.02
Wikipedia (small) load_wiki_small 135,5 MB 456,886 2,085,165 36,504 1.84
Wikipedia (large) load_wiki_large 1,01 GB 2,365,436 2,085,167 325,056 3.26

Each load_* function returns two arrays which contain the features and the target classes, respectively. In the multi-label case, the target array is 2D. The arrays are sparse when applicable.

>>> from myriad import datasets

>>> X, y = datasets.load_dmoz()
>>> X

>>> y

The first time you call a load_* function, the data will be downloaded and saved into a .svm file that adheres to the LIBSVM format convention. The loaders will restart from scratch if you interrupt them during their work.

All of the datasets are loaded in memory with the svmloader library. The latter is much faster than the load_svmlight_file function from scikit-learn. However, when working repeatedly on the same dataset, it is recommended to wrap the dataset loader with joblib.Memory.cache to store a memmapped backup of the results of the first call. This enables near instantaneous loading for subsequent calls.

You can see where the datasets are stored as so:

>>> datasets.get_data_home()

Benchmarks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

myriade-0.1.0.tar.gz (9.7 kB view hashes)

Uploaded Source

Built Distribution

myriade-0.1.0-py3-none-any.whl (10.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page