We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.
59 PAPERS • 1 BENCHMARK
OCNLI stands for Original Chinese Natural Language Inference. It is corpus for Chinese Natural Language Inference, collected following closely the procedures of MNLI, but with enhanced strategies aiming for more challenging inference pairs. No human/machine translation is used in creating the dataset, and thus the Chinese texts are original and not translated.
40 PAPERS • 3 BENCHMARKS