Common Voice is an audio dataset that consists of a unique MP3 and corresponding text file. There are 9,283 recorded hours in the dataset. The dataset also includes demographic metadata like age, sex, and accent. The dataset consists of 7,335 validated hours in 60 languages.
314 PAPERS • 164 BENCHMARKS
We present a multilingual test set for conducting speech intelligibility tests in the form of diagnostic rhyme tests. The materials currently contain audio recordings in 5 languages and further extensions are in progress. For Mandarin Chinese, we provide recordings for a consonant contrast test as well as a tonal contrast test. Further information on the audio data, test procedure and software to set up a full survey which can be deployed on crowdsourcing platforms is provided in our paper [arXiv preprint] and GitHub repository. We welcome contributions to this open-source project.
1 PAPER • NO BENCHMARKS YET
This MCCS dataset is the first large-scale Mandarin Chinese Cued Speech dataset. This dataset covers 23 major categories of scenarios (e.g, communication, transportation and shoping) and 72 subcategories of scenarios (e.g, meeting, dating and introduction). It is recorded by four skilled native Mandarn Chinese Cued Speech cuers with portable cameras on the mobile phones. The Cued Speech videos are recorded with 30fps and 1280x720 format. We provide the raw Cued Speech videos, text file (with 1000 sentences) and corresponding annotations which contains two kind of data annotation. One is continuious video annotation with ELAN, the other is discrete audio annotations with Praat.
0 PAPER • NO BENCHMARKS YET