The data set contains several speakers. The 5 largest are listed individually, the rest are summarized as other. All audio files have a sampling rate of 44.1kHz. For each speaker, there is a clean variant in addition to the full data set, where the quality is even higher. Furthermore, there are various statistics. The dataset can also be used for automatic speech recognition (ASR) if audio files are converted to 16 kHz.
3 PAPERS • 2 BENCHMARKS
JamALT is a revision of the JamendoLyrics dataset (80 songs in 4 languages), adapted for use as an automatic lyrics transcription (ALT) benchmark.
1 PAPER • 5 BENCHMARKS
The M-AILABS Speech Dataset is the first large dataset that we are providing free-of-charge, freely usable as training data for speech recognition and speech synthesis. Most of the data is based on LibriVox and Project Gutenberg. The training data consist of nearly thousand hours of audio and the text-files in prepared format. A transcription is provided for each clip. Clips vary in length from 1 to 20 seconds and have a total length of approximately shown in the list (and in the respective info.txt-files) below. The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded by the LibriVox project and is also in the public domain
1 PAPER • 1 BENCHMARK
The SWC is a corpus of aligned Spoken Wikipedia articles from the English, German, and Dutch Wikipedia. This corpus has several outstanding characteristics:
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).