The CMU Wilderness Multilingual Speech Dataset is a dataset of over 700 different languages providing audio, aligned text and word pronunciations. On average each language provides around 20 hours of sentence-lengthed transcriptions.
Source: Alan W Black "CMU Wilderness Multilingual Speech Dataset" ICASSP 2019, Brighton, UK.Paper | Code | Results | Date | Stars |
---|