graviti
Products
Resources
About us
The LJ Speech
Text
NLP
|...
License: Custom

Overview

This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker
reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips
vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

Data Collection

Total Clips 13,100
Total Words 225,715
Total Characters 1,308,678
Total Duration 23:55:17
Mean Clip Duration 6.57 sec
Min Clip Duration 1.11 sec
Max Clip Duration 10.10 sec
Mean Words per Clip 17.23
Distinct Words 13,821

Data Format

Metadata is provided in transcripts.csv. This file consists of one record per line, delimited
by the pipe character (0x7c). The fields are:

  1. ID: this is the name of the corresponding .wav file
  2. Transcription: words spoken by the reader (UTF-8)
  3. Normalized Transcription: transcription
    with numbers, ordinals, and monetary units expanded into full words (UTF-8).

Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz.

Citation

@misc{ljspeech17,
  author       = {Keith Ito and Linda Johnson},
  title        = {The LJ Speech Dataset},
  howpublished = {\url{https://keithito.com/LJ-Speech-Dataset/}},
  year         = 2017
}

License

Custom

Data Summary
Type
Audio,
Amount
--
Size
2.56GB
Provided by
Keith Ito
Hi! I'm Keith Ito, a software engineer in sunny San Diego.My interests include data visualization, mobile apps, and machine learning. But mostly, I just like writing code that people (hopefully) find useful.
| Amount -- | Size 2.56GB
The LJ Speech
Text
NLP
License: Custom

Overview

This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker
reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips
vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

Data Collection

Total Clips 13,100
Total Words 225,715
Total Characters 1,308,678
Total Duration 23:55:17
Mean Clip Duration 6.57 sec
Min Clip Duration 1.11 sec
Max Clip Duration 10.10 sec
Mean Words per Clip 17.23
Distinct Words 13,821

Data Format

Metadata is provided in transcripts.csv. This file consists of one record per line, delimited
by the pipe character (0x7c). The fields are:

  1. ID: this is the name of the corresponding .wav file
  2. Transcription: words spoken by the reader (UTF-8)
  3. Normalized Transcription: transcription
    with numbers, ordinals, and monetary units expanded into full words (UTF-8).

Each audio file is a single-channel 16-bit PCM WAV with a sample rate of 22050 Hz.

Citation

@misc{ljspeech17,
  author       = {Keith Ito and Linda Johnson},
  title        = {The LJ Speech Dataset},
  howpublished = {\url{https://keithito.com/LJ-Speech-Dataset/}},
  year         = 2017
}

License

Custom

0
Start building your AI now
graviti
wechat-QR
Long pressing the QR code to follow wechat official account

Copyright@Graviti