Europarl-ST is a multilingual Spoken Language Translation corpus containing paired audio-text samples for SLT from and into 9 European languages, for a total of 72 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012.
55 PAPERS • NO BENCHMARKS YET
MaSS (Multilingual corpus of Sentence-aligned Spoken utterances) is an extension of the CMU Wilderness Multilingual Speech Dataset, a speech dataset based on recorded readings of the New Testament.
15 PAPERS • 3 BENCHMARKS
The ROBIN Technical Acquisition Speech Corpus (ROBINTASC) was developed within the ROBIN project. Its main purpose was to improve the behaviour of a conversational agent, allowing human-machine interaction in the context of purchasing technical equipment. It contains over 6 hours of read speech in Romanian language. We provide text files, associated speech files (WAV, 44.1KHz, 16-bit, single channel), annotated text files in CoNLL-U format.
4 PAPERS • NO BENCHMARKS YET