Speech Commands is an audio dataset of spoken words designed to help train and evaluate keyword spotting systems .
354 PAPERS • 4 BENCHMARKS
TAU Urban Acoustic Scenes 2019 development dataset consists of 10-seconds audio segments from 10 acoustic scenes: airport, indoor shopping mall, metro station, pedestrian street, public square, street with medium level of traffic, travelling by a tram, travelling by a bus, travelling by an underground metro and urban park. Each acoustic scene has 1440 segments (240 minutes of audio). The dataset contains in total 40 hours of audio.
13 PAPERS • 2 BENCHMARKS
VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac). Image Source: http://www.voxforge.org/home
11 PAPERS • 9 BENCHMARKS
EmoSpeech contains keywords with diverse emotions and background sounds, presented to explore new challenges in audio analysis.
1 PAPER • NO BENCHMARKS YET