FLoRes-101 is an evaluation benchmark for low-resource and multilingual machine translation. It consists of 3001 sentences extracted from English Wikipedia, covering a variety of different topics and domains. These sentences have been translated into 101 languages by professional translators through a carefully controlled process.
72 PAPERS • 8 BENCHMARKS
FLoRes-200 doubles the existing language coverage of FLoRes-101. Given the nature of the new languages, which have less standardization and require more specialized professional translations, the verification process became more complex. This required modifications to the translation workflow. FLoRes-200 has several languages which were not translated from English. Specifically, several languages were translated from Spanish, French, Russian, and Modern Standard Arabic.
68 PAPERS • NO BENCHMARKS YET
Tatoeba is a free collection of example sentences with translations geared towards foreign language learners. It is available in more than 400 languages. Its name comes from the Japanese phrase “tatoeba” (例えば), meaning “for example”. It is written and maintained by a community of volunteers through a model of open collaboration. Individual contributors are known as Tatoebans.
37 PAPERS • 26 BENCHMARKS