MultNIST Dataset

dataset

posted on 2023-11-30, 08:48 authored by David TowersDavid Towers, Rob Geada, Amir Atapour AbarghoueiAmir Atapour Abarghouei, Andrew Stephen McGough

Dataset containing the images and labels for the MultNIST data used in the CVPR NAS workshop Unseen-data challenge under the codename "Mateo"

The MultNIST dataset is a constructed dataset from MNIST Images. The intention of this dataset is to require machine learning models to do more than just image classification but also perform a calculation, in this case multiplaction followed by a mod operation. For each image, three MNIST Images were randomly chosen and combined together through the colour channels, resulting in a three colour-channel image so each MNIST image represents one colour channel.

The data is in a channels-first format with a shape of (n, 3, 28, 28) where n is the number of samples in the corresponding set (50,000 for training, 10,000 for validation, and 10,000 for testing).

There are ten classes in the dataset, with 7,000 examples of each, distributed evenly between the three subsets.

The label of each image is generated using the formula "(r * b * g) % 10" where r, g, and b are the red, green, and blue colour channels respectively. An example of a MultNIST Image would be a rgb configuation of 3, 7, and 4 respectively, which would result in a label of 4 ((3 * 7 * 4) % 10).

MultNIST Dataset

History

Usage metrics

Categories

Keywords

Licence

Exports