DuReader is a large-scale open-domain Chinese machine reading comprehension dataset. The dataset consists of 200K questions, 420K answers and 1M documents. The questions and documents are based on Baidu Search and Baidu Zhidao. The answers are manually generated. The dataset additionally provides question type annotations – each question was manually annotated as either Entity, Description or YesNo and one of Fact or Opinion.
61 PAPERS • 4 BENCHMARKS
Multilingual Knowledge Questions and Answers (MKQA) is an open-domain question answering evaluation set comprising 10k question-answer pairs aligned across 26 typologically diverse languages (260k question-answer pairs in total). The goal of this dataset is to provide a challenging benchmark for question answering quality across a wide set of languages. Answers are based on a language-independent data representation, making results comparable across languages and independent of language-specific passages. With 26 languages, this dataset supplies the widest range of languages to-date for evaluating question answering.
37 PAPERS • NO BENCHMARKS YET
XQA is a data which consists of a total amount of 90k question-answer pairs in nine languages for cross-lingual open-domain question answering.
6 PAPERS • NO BENCHMARKS YET