graviti
Products
Resources
About us
EMNIST
Classification
MNIST
|...
License: Unknown

Overview

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special
Database 19
and converted to a 28x28 pixel
image format and dataset structure that directly matches the MNIST dataset
. Further information on the dataset contents and conversion process can be found in the paper
available at here.

Dataset Summary

There are six different splits provided
in this dataset. A short summary of the dataset is provided below:

  • EMNIST ByClass: 814,255 characters. 62 unbalanced classes.
  • EMNIST ByMerge: 814,255 characters. 47 unbalanced classes.
  • EMNIST Balanced: 131,600 characters. 47 balanced classes.
  • EMNIST Letters: 145,600 characters. 26 balanced classes.
  • EMNIST Digits: 280,000 characters. 10 balanced classes.
  • EMNIST MNIST: 70,000 characters. 10 balanced classes.

The full complement of
the NIST Special Database 19 is available in the ByClass and ByMerge splits. The EMNIST Balanced
dataset contains a set of characters with an equal number of samples per class. The EMNIST
Letters dataset merges a balanced set of the uppercase and lowercase letters into a single
26-class task. The EMNIST Digits and EMNIST MNIST dataset provide balanced handwritten digit
datasets directly compatible with the original MNIST dataset.

Please refer to the EMNIST paper
[PDF, BIB]for
further details of the dataset structure.

Data Format

The dataset is provided in two file formats. Both versions of the dataset contain identical
information, and are provided entirely for the sake of convenience. The first dataset is provided
in a Matlab format that is accessible through both Matlab and Python (using the scipy.io.loadmat
function). The second version of the dataset is provided in the same binary format as the original
MNIST dataset as outlined in :

http://yann.lecun.com/exdb/mnist/.

Citation

Please use the following citation when referencing the dataset:

@inproceedings{cohen2017emnist,
  title={EMNIST: Extending MNIST to handwritten letters},
  author={Cohen, Gregory and Afshar, Saeed and Tapson, Jonathan and Van Schaik, Andre},
  booktitle={2017 International Joint Conference on Neural Networks (IJCNN)},
  pages={2921--2926},
  year={2017},
  organization={IEEE}
}
Data Summary
Type
Image,
Amount
--
Size
1.22GB
Provided by
Western Sydney University
Western Sydney University, is an Australian multi-campus university in the Greater Western region of Sydney. It is currently ranked in the top 400 in the world in the 2014 QS World University Rankings and 19th in Australia in 2015.
| Amount -- | Size 1.22GB
EMNIST
Classification
MNIST
License: Unknown

Overview

The EMNIST dataset is a set of handwritten character digits derived from the NIST Special
Database 19
and converted to a 28x28 pixel
image format and dataset structure that directly matches the MNIST dataset
. Further information on the dataset contents and conversion process can be found in the paper
available at here.

Dataset Summary

There are six different splits provided
in this dataset. A short summary of the dataset is provided below:

  • EMNIST ByClass: 814,255 characters. 62 unbalanced classes.
  • EMNIST ByMerge: 814,255 characters. 47 unbalanced classes.
  • EMNIST Balanced: 131,600 characters. 47 balanced classes.
  • EMNIST Letters: 145,600 characters. 26 balanced classes.
  • EMNIST Digits: 280,000 characters. 10 balanced classes.
  • EMNIST MNIST: 70,000 characters. 10 balanced classes.

The full complement of
the NIST Special Database 19 is available in the ByClass and ByMerge splits. The EMNIST Balanced
dataset contains a set of characters with an equal number of samples per class. The EMNIST
Letters dataset merges a balanced set of the uppercase and lowercase letters into a single
26-class task. The EMNIST Digits and EMNIST MNIST dataset provide balanced handwritten digit
datasets directly compatible with the original MNIST dataset.

Please refer to the EMNIST paper
[PDF, BIB]for
further details of the dataset structure.

Data Format

The dataset is provided in two file formats. Both versions of the dataset contain identical
information, and are provided entirely for the sake of convenience. The first dataset is provided
in a Matlab format that is accessible through both Matlab and Python (using the scipy.io.loadmat
function). The second version of the dataset is provided in the same binary format as the original
MNIST dataset as outlined in :

http://yann.lecun.com/exdb/mnist/.

Citation

Please use the following citation when referencing the dataset:

@inproceedings{cohen2017emnist,
  title={EMNIST: Extending MNIST to handwritten letters},
  author={Cohen, Gregory and Afshar, Saeed and Tapson, Jonathan and Van Schaik, Andre},
  booktitle={2017 International Joint Conference on Neural Networks (IJCNN)},
  pages={2921--2926},
  year={2017},
  organization={IEEE}
}
0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti