Project description

myriad

Multiclass classification with tens of thousands of classes

Usage

Datasets

Name	Function	Size	Samples	Features	Labels	Multi-label	Labels/sample
DMOZ	`load_dmoz`	614,8 MB	394,756	833,484	36,372	✓	1.02
Wikipedia (small)	`load_wiki_small`	135,5 MB	456,886	2,085,165	36,504	✓	1.84
Wikipedia (large)	`load_wiki_large`	1,01 GB	2,365,436	2,085,167	325,056	✓	3.26

Each load_* function returns two arrays which contain the features and the target classes, respectively. In the multi-label case, the target array is 2D. The arrays are sparse when applicable.

>>> from myriad import datasets

>>> X, y = datasets.load_dmoz()
>>> X

>>> y

The first time you call a load_* function, the data will be downloaded and saved into a .svm file that adheres to the LIBSVM format convention. The loaders will restart from scratch if you interrupt them during their work.

All of the datasets are loaded in memory with the svmloader library. The latter is much faster than the load_svmlight_file function from scikit-learn. However, when working repeatedly on the same dataset, it is recommended to wrap the dataset loader with joblib.Memory.cache to store a memmapped backup of the results of the first call. This enables near instantaneous loading for subsequent calls.

You can see where the datasets are stored as so:

>>> datasets.get_data_home()

Benchmarks

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.0

Jan 5, 2023

This version

0.1.0

Jan 1, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

myriade-0.1.0.tar.gz (9.7 kB view hashes)

Uploaded Jan 1, 2023 Source

Built Distribution

myriade-0.1.0-py3-none-any.whl (10.9 kB view hashes)

Uploaded Jan 1, 2023 Python 3

Hashes for myriade-0.1.0.tar.gz

Hashes for myriade-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5f94cff099bb37234838480c4093e84f1975beb9ebff574615a8e9c06d61fe6b`
MD5	`70af5f97184cd26719c0c32867487e04`
BLAKE2b-256	`be5b305c8decc817a1e1f757d18c20baf8a6541039dfb4e7de4ec1de9cdc8500`

Hashes for myriade-0.1.0-py3-none-any.whl

Hashes for myriade-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db6c8155523754296f54b3d10369c536f26499660f770bdc4d21baac565998cc`
MD5	`6a6b2b6f1b81de78c0e0206888aaffee`
BLAKE2b-256	`c49d37a3240bd1ce2e38c6361fcbc0f963537e4ca927f8d663af8e3e8c39bc1a`