VoxCeleb2
Classification
Voice Print Recognition
|...
License: CC BY 4.0

Overview

VoxCeleb2 contains over 1 million utterances for over 6,000 celebrities, extracted from videos uploaded to YouTube. The dataset is fairly gender balanced, with 61% of the speakers male. The speakers span a wide range of different ethnicities, accents, professions and ages. Videos included in the dataset are shot in a large number of challenging visual and auditory environments. These include interviews from red carpets, outdoor stadiums and quiet indoor studios, speeches given to large audiences, excerpts from professionally shot multimedia, and even crude videos shot on hand-held devices. Audio segments present in the dataset are degraded with background chatter, laughter, overlapping speech and varying room acoustics. We also provide face detections and face-tracks for the speakers in the dataset, and the face images are similarly ‘in the wild’, with variations in pose (including profiles), lighting, image quality and motion blur. Table 1 gives the general statistics, and Figure 1 shows examples of cropped faces as well as utterance length, gender and nationality distributions. The dataset contains both development (train/val) and test sets. However, since we use the VoxCeleb1 dataset for testing, only the development set will be used for the speaker recognition task (Sections 4 and 5). The VoxCeleb2 test set should prove useful for other applications of audio-visual learning for which the dataset might be used. The split is given in Table 2. The development set of VoxCeleb2 has no overlap with the identities in the VoxCeleb1 or SITW datasets.

Citation

@InProceedings{Chung18b,
  author       = "Chung, J.~S. and Nagrani, A. and Zisserman, A.",
  title        = "VoxCeleb2: Deep Speaker Recognition",
  booktitle    = "INTERSPEECH",
  year         = "2018",
}

License

CC BY 4.0

Data Summary
Type
Audio, Video,
Amount
--
Size
--
Provided by
Seebibyte
Seebibyte: Visual Search for the Era of Big Data is a large research project based in the Department of Engineering Science, University of Oxford. It is funded by the EPSRC (Engineering and Physical Sciences Research Council), and will run from 2015 - 2020.
Issue
Start Building AI Now