ELI5 is a dataset for long-form question answering. It contains 270K complex, diverse questions that require explanatory multi-sentence answers. Web search results are used as evidence documents to answer each question.
120 PAPERS • 1 BENCHMARK
QuALITY (Question Answering with Long Input Texts, Yes!) is a multiple-choice question answering dataset for long document comprehension. The dataset consists of context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, the questions are written and validated by contributors who have read the entire passage, rather than relying on summaries or excerpts.
52 PAPERS • 1 BENCHMARK
LLeQA is a French native dataset for studying information retrieval and long-form question answering in the legal domain. It consists of a knowledge corpus of 27,941 statutory articles collected from the Belgian legislation, and 1,868 legal questions posed by Belgian citizens and labeled by experienced jurists with a comprehensive answer rooted in relevant articles from the corpus.
1 PAPER • NO BENCHMARKS YET