The MEDIA French corpus is dedicated to semantic extraction from speech in a context of human/machine dialogues. The corpus has manual transcription and conceptual annotation of dialogues from 250 speakers. It is split into the following three parts : (1) the training set (720 dialogues, 12K sentences), (2) the development set (79 dialogues, 1.3K sentences, and (3) the test set (200 dialogues, 3K sentences).
6 PAPERS • NO BENCHMARKS YET
The SmartSpeaker benchmark tests the performance of reacting to music player commands in English as well as in French. It has the difficulty of containing many artist or music tracks with uncommon names in the commands, like “play music by [a boogie wit da hoodie]” or “I’d like to listen to [Kinokoteikoku]”.
3 PAPERS • 1 BENCHMARK