MAFiD: Moving Average Equipped Fusion-in-Decoder for Question Answering over Tabular and Textual Data

Transformer-based models for question answering (QA) over tables and texts confront a “long” hybrid sequence over tabular and textual elements, causing long-range reasoning problems. To handle long-range reasoning, we extensively employ a fusion-in-decoder (FiD) and exponential moving average (EMA), proposing a {underline{M}oving {underline{A}verage Equipped {underline{F}usion-{underline{i}n-{underline{D}ecoder ({textbf{MAFiD}). With FiD as the backbone architecture, MAFiD combines various levels of reasoning: {textit{independent encoding} of homogeneous data and {textit{single-row} and {textit{multi-row heterogeneous reasoning}, using a {textit{gated cross attention layer} to effectively aggregate the three types of representations resulting from various reasonings. Experimental results on HybridQA indicate that MAFiD achieves state-of-the-art performance by increasing exact matching (EM) and F1 by $1.1$ and $1.7$, respectively, on the blind test set.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Question Answering HybridQA MAFiD ANS-EM 65.4 # 1

Methods