Skip to main content

Image Classification Dataset Generator

Project description

ICGen

Installation

The Package

git clone https://github.com/automl/ICGen.git
pip install ICGen/

Downloading the Datasets

To download datasets you can run

python -m icgen.download --data_path DATA_PATH --datasets D1 D2 D3

or directly download a complete group

python -m icgen.download --data_path DATA_PATH --dataset_group GROUP  # all, train, dev, test

For a list of available datasets you can run

python -m icgen.dataset_names

Usage

Sampling Tasks

import icgen
dataset_generator = icgen.ICDatasetGenerator(
  data_path="datasets",  # Replace with the data_path you downloaded the datasets to
  min_resolution=16,
  max_resolution=512,
  max_log_res_deviation=1,  # Sample only 1 log resolution from the native one
  min_classes=2,
  max_classes=100,
  min_examples_per_class=20,
  max_examples_per_class=100_000,
)
dev_data, test_data, dataset_info = dataset_generator.get_dataset(dataset="cifar10", augment=True)

The augment parameter controls whether the original dataset is modified.

Options only affect sampling with augment=True and the min max ranges do not filter datasets.

The data is left at the original resolution, so it can be resized once by the user.

You can also sample from a list of datasets

task = dataset_generator.get_dataset(datasets=["cifar100", "emnist/balanced"], augment=True)

We provide some lists of available datasets

import icgen
icgen.DATASETS_TRAIN
icgen.DATASETS_VAL
icgen.DATASETS_TEST
icgen.DATASETS

Reconstructing and Distributing Tasks

In distributed applications it may be necessary to sample datasets on one machine and then use them on another one. Conversely, for reproducibility it may be necessary to store the exact dataset which was used. For these cases icgen uses a dataset identifier which uniquely identifies datasets.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icgen-0.2.1.tar.gz (24.9 kB view hashes)

Uploaded Source

Built Distribution

icgen-0.2.1-py3-none-any.whl (40.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page