Supporting Data for: UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images

Version 1.1

Gupta, Deepak K.; Bhamba, Udbhav; Thakur, Abhishek; Gupta, Akash; Sharan, Suraj; Demir, Ertugrul; Prasad, Dilip K., 2023, "Supporting Data for: UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images", https://doi.org/10.18710/4F4KJS, DataverseNO, V1

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

33 Downloads

Description	Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significant loss of critical information. Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present ‘UltraMNIST dataset’, a simple yet representative benchmark dataset for this task. UltraMNIST has been designed using the popular MNIST digits with additional levels of complexity added to replicate well the challenges of real-world problems. We present two variants of the problem: ‘UltraMNIST classification’ and ‘Budget-aware UltraMNIST classification’. The standard UltraMNIST classification benchmark is intended to facilitate the development of novel CNN training methods that make the effective use of the best available GPU resources. The budget-aware variant is intended to promote development of methods that work under constrained GPU memory. For the development of competitive solutions, we present several baseline models for the standard benchmark and its budget-aware variant. We study the effect of reducing resolution on the performance and present results for baseline models involving pretrained backbones from among the popular state-of-the-art models. Finally, with the presented benchmark dataset and the baselines, we hope to pave the ground for a new generation of CNN methods suitable for handling large images in an efficient and resource-light manner. UltraMNIST dataset comprises very large-scale images, each of 4000x4000 pixels with 3-5 digits per image. Each of these digits has been extracted from the original MNIST dataset. Your task is to predict the sum of the digits per image, and this number can be anything from 0 to 27. (2022-04-15)
Subject	Medicine, Health and Life Sciences; Computer and Information Science
Keyword	Large Scale Image Classification, Ultra Large MNIST dataset, Variable scale features, CNN classification
Related Publication	Gupta, D. K., Bamba, U., Thakur, A., Gupta, A., Sharan, S., Demir, E., & Prasad, D. K. (2022). UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images. doi: 10.48550/arXiv.2206.12681
License/Data Use Agreement	CC0 1.0

Filter by

	1 to 4 of 4 Files	Download
	00_ReadMe.txt Plain Text - 6.2 KB Published Mar 29, 2023 7 Downloads MD5: 66b4fc3733f5c54175c57f7c40d24869	Preview "00_ReadMe.txt" Access File File Access Public Download Options Plain Text Download Metadata Data File Citation EndNote XML RIS BibTeX
	test.zip ZIP Archive - 8.7 GB Published Mar 29, 2023 9 Downloads MD5: 776f8bccbb1285032864afee9cfa991b	Preview "test.zip" Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX
	train.csv Comma Separated Values - 373.1 KB Published Mar 29, 2023 9 Downloads MD5: acd730388d00a1102f17a8139106ac42 For testing purposes you may contact Dilip K. Prasad at dilip.prasad@uit.no	Preview "train.csv" Access File File Access Public Download Options Comma Separated Values Download Metadata Data File Citation EndNote XML RIS BibTeX
	train.zip ZIP Archive - 8.8 GB Published Mar 29, 2023 8 Downloads MD5: d6b3f3ca33e41e006e258f93704417e2 The training files containing all the images of the training set.	Preview "train.zip" Access File File Access Public Download Options ZIP Archive Download Metadata Data File Citation EndNote XML RIS BibTeX

Citation Metadata

Persistent Identifier	doi:10.18710/4F4KJS
Publication Date	2023-03-29
Title	Supporting Data for: UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images
Alternative URL	https://www.kaggle.com/c/ultra-mnist
Author	Gupta, Deepak K. (UiT The Arctic University of Norway) Bhamba, Udbhav Thakur, Abhishek Gupta, Akash Sharan, Suraj Demir, Ertugrul Prasad, Dilip K. (UiT The Arctic University of Norway) - ORCID: 0000-0002-3693-6973
Point of Contact	Use email button above to contact. Prasad, Dilip K. (UiT The Arctic University of Norway)
Description	Convolutional neural network (CNN) approaches available in the current literature are designed to work primarily with low-resolution images. When applied on very large images, challenges related to GPU memory, smaller receptive field than needed for semantic correspondence and the need to incorporate multi-scale features arise. The resolution of input images can be reduced, however, with significant loss of critical information. Based on the outlined issues, we introduce a novel research problem of training CNN models for very large images, and present ‘UltraMNIST dataset’, a simple yet representative benchmark dataset for this task. UltraMNIST has been designed using the popular MNIST digits with additional levels of complexity added to replicate well the challenges of real-world problems. We present two variants of the problem: ‘UltraMNIST classification’ and ‘Budget-aware UltraMNIST classification’. The standard UltraMNIST classification benchmark is intended to facilitate the development of novel CNN training methods that make the effective use of the best available GPU resources. The budget-aware variant is intended to promote development of methods that work under constrained GPU memory. For the development of competitive solutions, we present several baseline models for the standard benchmark and its budget-aware variant. We study the effect of reducing resolution on the performance and present results for baseline models involving pretrained backbones from among the popular state-of-the-art models. Finally, with the presented benchmark dataset and the baselines, we hope to pave the ground for a new generation of CNN methods suitable for handling large images in an efficient and resource-light manner. UltraMNIST dataset comprises very large-scale images, each of 4000x4000 pixels with 3-5 digits per image. Each of these digits has been extracted from the original MNIST dataset. Your task is to predict the sum of the digits per image, and this number can be anything from 0 to 27. (2022-04-15)
Subject	Medicine, Health and Life Sciences; Computer and Information Science
Keyword	Large Scale Image Classification Ultra Large MNIST dataset Variable scale features CNN classification
Related Publication	Gupta, D. K., Bamba, U., Thakur, A., Gupta, A., Sharan, S., Demir, E., & Prasad, D. K. (2022). UltraMNIST Classification: A Benchmark to Train CNNs for Very Large Images. doi: 10.48550/arXiv.2206.12681 https://doi.org/10.48550/arXiv.2206.12681
Language	English
Producer	UiT The Arctic University of Norway (UiT) https://en.uit.no/
Contributor	Researcher : Gupta, Deepak K. Researcher : Bhamba, Udbhav Researcher : Thakur, Abhishek Researcher : Gupta, Akash Researcher : Sharan, Suraj Researcher : Demir, Ertugrul Researcher : Prasad, Dilip K. Other : Nirwan Banerjee
Funding Information	The Research Council of Norway: grant no. 325741 UiT The Arctic University of Norway: Cristin project id 2061348
Distributor	UiT The Arctic University of Norway (UiT The Arctic University of Norway) https://dataverse.no/dataverse/uit
Depositor	Banerjee, Nirwan
Deposit Date	2023-03-06
Date of Collection	Start Date: 2022-04-15
Data Type	Image data; Large Scale data; Variable Scale Feature data
Series	Ultra-MNIST: https://www.kaggle.com/competitions/ultra-mnist
Software	Python, Version: 3 Numpy
Data Source	https://www.kaggle.com/competitions/ultra-mnist

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Creative Commons CC0 1.0 Universal Public Domain Dedication. CC0 1.0

Restricted Files + Terms of Access

Dataset Version	Summary	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Edit Retention Period

The selected file or files have already been published. Contact an administrator to change the retention period date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired or the files can only be transferred via Globus.

You may request access to any restricted file(s) by clicking the Request Access button.

Ineligible Files Selected

The selected file(s) may not be transferred because you have not been granted access or the file(s) have a retention period that has expired or the files are not Globus accessible.

You may request access to any restricted file(s) by clicking the Request Access button.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 4.7 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, with an expired retention period, or not Globus accessible.)

Click Continue to transfer the elligible files.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? This is permanent and the selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? This is permanent an it will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Creative Commons CC0 1.0 Universal Public Domain Dedication. CC0 1.0

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://dataverse.no/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.2)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until UiT The Arctic University of Norway is published by its administrator.

Publish Dataset

This dataset cannot be published until UiT The Arctic University of Norway and DataverseNO are published.

Return to Author

Return this dataset to contributor for modification.