WIQA: A dataset for ``What if...'' reasoning over procedural text
We introduce WIQA, the first large-scale dataset of {``}What if...{''} questions over procedural text. WIQA contains a collection of paragraphs, each annotated with multiple influence graphs describing how one change affects another, and a large (40k) collection of {``}What if...?{''} multiple-choice questions derived from these. For example, given a paragraph about beach erosion, would stormy weather hasten or decelerate erosion? WIQA contains three kinds of questions: perturbations to steps mentioned in the paragraph; external (out-of-paragraph) perturbations requiring commonsense knowledge; and irrelevant (no effect) perturbations. We find that state-of-the-art models achieve 73.8{\%} accuracy, well below the human performance of 96.3{\%}. We analyze the challenges, in particular tracking chains of influences, and present the dataset as an open challenge to the community.
PDF Abstract