🔔 Share your dataset with the ML community!

Filter by Modality (clear)

Filter by Task (clear)

Filter by Language

278 dataset results for Question Answering AND Texts

The COPA-HR dataset (Choice of plausible alternatives in Croatian) is a translation of the English COPA dataset by following the XCOPA dataset translation methodology. The dataset consists of 1000 premises (My body cast a shadow over the grass), each given a question (What is the cause?), and two choices (The sun was rising; The grass was cut), with a label encoding which of the choices is more plausible given the annotator or translator (The sun was rising).

1 PAPER • NO BENCHMARKS YET

CUHK-QA

CUHK-QA is a dataset for natural language-based person search using iterative questioning.

1 PAPER • NO BENCHMARKS YET

ChAII - Hindi and Tamil Question Answering

The dataset covers Hindi and Tamil, collected without the use of translation. It provides a realistic information-seeking task with questions written by native-speaking expert data annotators.

1 PAPER • 1 BENCHMARK

ChiQA (Chinese VQA)

ChiQA is a dataset designed for visual question answering tasks that not only measures the relatedness but also measures the answerability, which demands more fine-grained vision and language reasoning. It contains more than 40K questions and more than 200K question-images pairs. The questions are real-world image-independent queries that are more various and unbiased.

1 PAPER • NO BENCHMARKS YET

CompMix

CompMix is a crowdsourced QA benchmark which naturally demands the integration of a mixture of input sources. CompMix has a total of 9,410 questions, and features several complex intents like joins and temporal conditions.

1 PAPER • NO BENCHMARKS YET

CoreSearch

CoreSearch is a dataset for Cross-Document Event Coreference Search. It consists of two separate passage collections: (1) a collection of passages containing manually annotated coreferring event mention, and (2) an annotated collection of destructor passages.

1 PAPER • NO BENCHMARKS YET

Dialog-based Language Learning dataset

Dialog-based Language Learning dataset is designed to measure how well models can perform at learning as a student given a teacher’s textual responses to the student’s answer (as well as potentially receiving an external real-valued reward signal).

1 PAPER • NO BENCHMARKS YET

EpiK-Eval (Epistemic Knowledge Evaluation)

Benchmark to evaluate the capability of LMs to consolidate and recall information from multiple training documents.

1 PAPER • NO BENCHMARKS YET

Financial Language Understanding Evaluation

Financial Language Understanding Evaluation is an open-source comprehensive suite of benchmarks for the financial domain. It contains benchmarks across 5 NLP tasks in financial domain as well as common benchmarks used in the previous research. The tasks are financial sentiment analysis, news headline classification, named entity recognition, structure boundary detection and question answering.

1 PAPER • NO BENCHMARKS YET

LLeQA (Long-form Legal Question Answering)

LLeQA is a French native dataset for studying information retrieval and long-form question answering in the legal domain. It consists of a knowledge corpus of 27,941 statutory articles collected from the Belgian legislation, and 1,868 legal questions posed by Belgian citizens and labeled by experienced jurists with a comprehensive answer rooted in relevant articles from the corpus.

1 PAPER • NO BENCHMARKS YET

MLQuestions

MLQuestions is a domain-adaptation dataset for the machine learning domain containing 50K unaligned passages and 35K unaligned questions, and 3K aligned passage and question pairs.

1 PAPER • NO BENCHMARKS YET

NText

NText is an eight million words dataset extracted and preprocessed from nuclear research papers and thesis.

1 PAPER • NO BENCHMARKS YET

PQ-decaNLP (Paraphrase Questions - decaNLP)

Multitask learning has led to significant advances in Natural Language Processing, including the decaNLP benchmark where question answering is used to frame 10 natural language understanding tasks in a single model. PQ-decaNLP is a crowd-sourced corpus of paraphrased questions, annotated with paraphrase phenomena. This enables analysis of how transformations such as swapping the class labels and changing the sentence modality lead to a large performance degradation.

1 PAPER • NO BENCHMARKS YET

PersianQA (Persian Question Answering Dataset)

PersianQA: a dataset for Persian Question Answering Persian Question Answering (PersianQA) Dataset is a reading comprehension dataset on Persian Wikipedia. The crowd-sourced the dataset consists of more than 9,000 entries. Each entry can be either an impossible-to-answer or a question with one or more answers spanning in the passage (the context) from which the questioner proposed the question. Much like the SQuAD2.0 dataset, the impossible or unanswerable questions can be utilized to create a system which "knows that it doesn't know the answer".

1 PAPER • NO BENCHMARKS YET

Phrase-in-Context

Phrase in Context is a curated benchmark for phrase understanding and semantic search, consisting of three tasks of increasing difficulty: Phrase Similarity (PS), Phrase Retrieval (PR) and Phrase Sense Disambiguation (PSD). The datasets are annotated by 13 linguistic experts on Upwork and verified by two groups: ~1000 AMT crowdworkers and another set of 5 linguistic experts. PiC benchmark is distributed under CC-BY-NC 4.0.

1 PAPER • NO BENCHMARKS YET

Pirá

Pirá (Pirá: A Bilingual Portuguese-English Dataset for Question-Answering about the Ocean)

A large set of questions and answers about the ocean and the Brazilian coast both in Portuguese and English. Pirá is a crowdsourced question answering (QA) dataset on the ocean and the Brazilian coast designed for reading comprehension.

1 PAPER • NO BENCHMARKS YET

QALD-9-Plus

QALD-9-Plus Dataset Description QALD-9-Plus is the dataset for Knowledge Graph Question Answering (KGQA) based on well-known QALD-9.

1 PAPER • 1 BENCHMARK

QASiNa (Question Answering Sirah Nabawiyah)

Question Answering Sirah Nabawiyah (QASiNa) Dataset is a reading comprehension dataset consists of QA from Sirah Nabawiyah literature in Indonesian Language

1 PAPER • NO BENCHMARKS YET

SCIMAT

SCIMAT is a large question-answer dataset for mathematics and science problems; such dataset can have impact on online education, intelligent tutoring and automated grading.

1 PAPER • NO BENCHMARKS YET

SOTU_QA_2023

Curated QA Benchmark on State of the Union Address 2023. It contains curated question and answers based on knowledge presented in State of the Union Address 2023 (in Feb). It is especially useful for tool-augmented LMs / ALMs to examine the model's ability in answering over private document.

1 PAPER • NO BENCHMARKS YET

SQuAD-it

SQuAD-it is derived from the SQuAD dataset and it is obtained through semi-automatic translation of the SQuAD dataset into Italian. It represents a large-scale dataset for open question answering processes on factoid questions in Italian. The dataset contains more than 60,000 question/answer pairs derived from the original English dataset.

1 PAPER • 1 BENCHMARK

ScienceExamCER

ScienceExamCER is a collection of resources for studying explanation-centered inference, including explanation graphs for 1,680 questions, with 4,950 tablestore rows, and other analyses of the knowledge required to answer elementary and middle-school science questions.

1 PAPER • NO BENCHMARKS YET

TinySocial

TinySocial is a dataset to enable research on Social Visual Question Answering.

1 PAPER • NO BENCHMARKS YET

ToM QA

The data consists of a set of 3 task types and 4 question types, creating 12 total scenarios. The tasks are grouped into stories, which are denoted by the numbering at the start of each line.

1 PAPER • NO BENCHMARKS YET

TriviaHG

Nowadays, individuals tend to engage in dialogues with Large Language Models, seeking answers to their questions. In times when such answers are readily accessible to anyone, the stimulation and preservation of human’s cognitive abilities, as well as the assurance of maintaining good reasoning skills by humans becomes crucial. This study addresses such needs by proposing hints (instead of final answers or before giving answers) as a viable solution. We introduce a framework for the automatic hint generation for factoid questions, employing it to construct TriviaHG, a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. Additionally, we present an automatic evaluation method that measures the Convergence and Familiarity quality attributes of hints. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided

1 PAPER • NO BENCHMARKS YET

TupleInf Open IE Dataset

The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in “Answering Complex Questions Using Open Information Extraction” (referred as Tuple KB, T). These sentences were collected from a large Web corpus using training questions from 4th and 8th grade as queries. This dataset contains 156K sentences collected for 4th grade questions and 107K sentences for 8th grade questions. Each sentence is followed by the Open IE v4 tuples using their simple format.

1 PAPER • NO BENCHMARKS YET

VDQG (Visual Discriminative Question Generation)

The Visual Discriminative Question Generation (VDQG) dataset contains 11202 ambiguous image pairs collected from Visual Genome. Each image pair is annotated with 4.6 discriminative questions and 5.9 non-discriminative questions on average.

1 PAPER • NO BENCHMARKS YET

VQA 360°

VQA 360° is a dataset for visual question answering on 360° images containing around 17,000 real-world image-question-answer triplets for a variety of question types.

1 PAPER • NO BENCHMARKS YET

VTQA

VTQA (Visual Text Question Answering)

VTQA is a dataset containing open-ended questions about image-text pairs. This dataset requires the model to align multimedia representations of the same entity to implement multi-hop reasoning between image and text and finally use natural language to answer the question. The aim of this dataset is to develop and benchmark models that are capable of multimedia entity alignment, multi-step reasoning and open-ended answer generation. VTQA dataset consists of 10,238 image-text pairs and 27,317 questions. The images are real images from MSCOCO dataset, containing a variety of entities. The annotators are required to first annotate relevant text according to the image, and then ask questions based on the image-text pair, and finally answer the question open-ended.

1 PAPER • NO BENCHMARKS YET

VlogQA

VlogQA (Vietnamese Spoken-Based Machine Reading Comprehension)

The VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube - an extensive source of user-uploaded content, covering the topics of food and travel in the Vietnamese language. This dataset is used for research in Vietnamese Spoken-Based Machine Reading Comprehension.

1 PAPER • NO BENCHMARKS YET

WikiSuggest

To collect WikiSuggest, Google Suggest API is used to harvest natural language questions and submit them to Google Search. Whenever Google Search returns a box with a short answer from Wikipedia, an example from the question, answer, and the Wikipedia document are created. If the answer string is missing from the document this often implies a spurious question-answer pair, such as (‘what time is half time in rugby’, ‘80 minutes, 40 minutes’). Question-answer pairs without the exact answer string are pruned. Fifty examples after filtering are examined and 54% were found to be well-formed question-answer pairs where answers in the document can be grounded, 20% contained answers without textual evidence in the document (the answer string exists in an irreleveant context), and 26% contain incorrect QA pairs.

1 PAPER • NO BENCHMARKS YET

XLingEval

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

XQuAD-IN

Given a question and passage in an Indic language, generate a short answer span from the passage as the answer.

1 PAPER • NO BENCHMARKS YET

Xamarin Q&A

Xamarin Q&A consists of two datasets of questions and answers for studying the development of cross-platform mobile applications using the Xamarin framework. The two datasets were created by mining two Q&A sites: Xamarin Forum and Stack Overflow. The datasets have 85,908 questions mined from the Xamarin Forum and 44,434 from Stack Overflow.

1 PAPER • NO BENCHMARKS YET

XorQA-IN:

Given a question in an Indic language and a passage in English, generate a short answer span. We provide both an English and target language answer span in the annotations.

1 PAPER • NO BENCHMARKS YET

catbAbI LM-mode

catbAbI LM-mode (concatenated-bAbI)

We aim to improve the bAbI benchmark as a means of developing intelligent dialogue agents. To this end, we propose concatenated-bAbI (catbAbI): an infinite sequence of bAbI stories. catbAbI is generated from the bAbI dataset and during training, a random sample/story from any task is drawn without replacement and concatenated to the ongoing story. The preprocessig for catbAbI addresses several issues: it removes the supporting facts, leaves the questions embedded in the story, inserts the correct answer after the question mark, and tokenises the full sample into a single sequence of words. As such, catbAbI is designed to be trained in an autoregressive way and analogous to closed-book question answering.

1 PAPER • 2 BENCHMARKS

catbAbI QA-mode

catbAbI QA-mode (concatenated-bAbI)

1 PAPER • 2 BENCHMARKS

simply-CLEVR

The simply-CLEVR dataset aims to provide a benchmark dataset that can be used for transparent quantitative evaluation of explanation methods (aka heatmaps/XAI methods). It is made of simple Visual Question Answering (VQA) questions, which are derived from the original CLEVR task, and where each question is accompanied by two Ground Truth Masks that serve as a basis for evaluating explanations on the input image.

1 PAPER • NO BENCHMARKS YET

Datasets

278 dataset results for Question Answering AND Texts