The LJ Speech
License:
Custom
Overview
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.
Data Collection
Total Clips | 13,100 |
---|---|
Total Words | 225,715 |
Total Characters | 1,308,678 |
Total Duration | 23:55:17 |
Mean Clip Duration | 6.57 sec |
Min Clip Duration | 1.11 sec |
Max Clip Duration | 10.10 sec |
Mean Words per Clip | 17.23 |
Distinct Words | 13,821 |
Data Format
Metadata is provided in transcripts.csv. This file consists of one record per line, delimited by the pipe character (0x7c). The fields are:
- ID: this is the name of the corresponding .wav file
- Transcription: words spoken by the reader (UTF-8)
- Normalized Transcription: transcription with numbers, ordinals, and monetary units expanded into full words (UTF-8).
Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz.
Citation
@misc{ljspeech17,
author = {Keith Ito and Linda Johnson},
title = {The LJ Speech Dataset},
howpublished = {\url{https://keithito.com/LJ-Speech-Dataset/}},
year = 2017
}