Skip to main content

mini-imagenet dataset transformed to fit classification task or keep the format for meta-learning tasks.

Project description

mini-ImageNet Logo mini-ImageNet Logo

The project Machine Learning CLassiFication (MLclf)

License: MIT Python PyPI Downloads

The purpose of this project is:

  1. transform the mini-imagenet dataset which is initially created for the few-shot learning to the format that fit the classical classification task. You can also use this package to download and obtain the raw data of the mini-imagenet dataset (for few-shot learning tasks).
  1. transform the tiny-imagenet dataset to the format that fit the classical classification task, which can be more easily used (being able to directly input to the Pytorch dataloader) compared to the original raw format.

The original dataset of mini-imagenet includes totally 100 classes, but due to its intention to meta-learning or few-shot learning, the train/validation/test dataset contains different classes. They have respectively 64/16/20 classes.

The original dataset of tiny-imagenet includes totally 200 classes, the train/validation/test dataset contains all classes. They have respectively 100000/10000/10000 images. For example, the training dataset has 500 images for each class.

In order to make the mini/tiny-imagenet dataset fit the format requirement for the classical classification task. MLclf made a proper transformation (recombination and splitting) of the original mini/tiny-imagenet dataset.

The transformed dataset of mini-imagenet is divided into train, validation and test dataset, each dataset of which includes 100 classes. Each image has the size 84x84 pixels with 3 channels.

The transformed dataset of tiny-imagenet is divided into train, validation and test dataset, each dataset of which includes 200 classes. Each image has the size 64x64 pixels with 3 channels.

Notice: The provider of tiny-imagenet dataset does not public the labels of testing dataset, so there is no labels for the original raw testing dataset.

The MLclf package can be found at: https://github.com/tiger2017/MLclf or at: https://pypi.org/project/MLclf/

Welcome to create an issue to the repository of MLclf on GitHub, and I will add more datasets loading functions based on the issues.

The mini-imagenet source data can be also accessed from: https://deepai.org/dataset/imagenet (there is no need to manually download it if you use MLclf).

Summary

Requirements

  • Python 3.x
  • numpy
  • torchvision

Installation

How to install MLclf package:

pip install MLclf

Usage

How to use this package for mini-imagenet:

from MLclf import MLclf
import torch
import torchvision.transforms as transforms

# Download the original mini-imagenet data:
MLclf.miniimagenet_download(Download=True) # only need to run this line before you download the mini-imagenet dataset for the first time.

# Transform the original data into the format that fits the task for classification:
# Note: If you want to keep the data format as the same as that for the meta-learning or few-shot learning (original format), just set ratio_train=0.64, ratio_val=0.16, shuffle=False.

transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
# The argument transform is a optional keyword. You can also set transform = None or simply not set transform, if you do not want the data being standardized and only want a normalization b/t [0,1].
train_dataset, validation_dataset, test_dataset = MLclf.miniimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2, seed_value=None, shuffle=True, transform=transform, save_clf_data=True)

# The dataset can be transformed to dataloader via torch: 

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=128, shuffle=True, num_workers=0)


# You can check the corresponding relations between labels and label_marks of the image data:
# (Note: The relations can be obtained after MLclf.miniimagenet_clf_dataset is called, otherwise they will be returned as None instead.)

labels_to_marks = MLclf.labels_to_marks['mini-imagenet']
marks_to_labels = MLclf.marks_to_labels['mini-imagenet']

You can also obtain the raw data of mini-imagenet from the downloaded pkl files:

from MLclf import MLclf

# The raw data of mini-imagenet can be also obtained via the function below:

data_raw_train, data_raw_val, data_raw_test = MLclf.miniimagenet_data_raw()

How to use this package for tiny-imagenet for the traditional classification task (similarly as mini-imagenet):

from MLclf import MLclf
import torch
import torchvision.transforms as transforms

MLclf.tinyimagenet_download(Download=True) # only need to run this line before you download the tiny-imagenet dataset for the first time.
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
train_dataset, validation_dataset, test_dataset = MLclf.tinyimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2,
                                                                                     seed_value=None, shuffle=True,
                                                                                     transform=transform,
                                                                                     save_clf_data=True,
                                                                                     few_shot=False)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=5, shuffle=True, num_workers=0)

# You can check the corresponding relations between labels and label_marks of the image data:
# (Note: The relations can be obtained after MLclf.miniimagenet_clf_dataset is called, otherwise they will be returned as None instead.)

labels_to_marks = MLclf.labels_to_marks['tiny-imagenet']
marks_to_labels = MLclf.marks_to_labels['tiny-imagenet']


data_raw_train, data_raw_val, data_raw_test = MLclf.tinyimagenet_data_raw()

If you want to use tiny-imagenet for the few-shot learning task, just change few_shot=True, for example:

train_dataset, validation_dataset, test_dataset = MLclf.tinyimagenet_clf_dataset(ratio_train=0.6, ratio_val=0.2,
                                                                                     seed_value=None, shuffle=True,
                                                                                     transform=transform,
                                                                                     save_clf_data=True,
                                                                                     few_shot=True)
# only original training dataset is used as the whole dataset of the few-shot learning task, so 200 classes in total,
# and in this few-shot learning task's example, 120 classes as training dataset, 40 classes as validation dataset and 40 classes as testing dataset, with 500 images for each class.

Here is a random joke that'll make you laugh!

Jokes Card

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MLclf-0.2.14.tar.gz (9.7 kB view hashes)

Uploaded Source

Built Distribution

MLclf-0.2.14-py3-none-any.whl (10.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page