TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Question Answering (VQA)	TDIUC	Accuracy	Accuracy	88.2	# 1
Visual Question Answering (VQA)	VQA-CP	MuRel	Score	39.54	# 9
Visual Question Answering (VQA)	VQA v2 test-dev	MuRel	Accuracy	68.03	# 37
Visual Question Answering (VQA)	VQA v2 test-std	MuRel	overall	68.4	# 32

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/murel-multimodal-relational-reasoning-for/visual-question-answering-on-tdiuc)](https://paperswithcode.com/sota/visual-question-answering-on-tdiuc?p=murel-multimodal-relational-reasoning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/murel-multimodal-relational-reasoning-for/visual-question-answering-on-vqa-cp)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-cp?p=murel-multimodal-relational-reasoning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/murel-multimodal-relational-reasoning-for/visual-question-answering-on-vqa-v2-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-std?p=murel-multimodal-relational-reasoning-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/murel-multimodal-relational-reasoning-for/visual-question-answering-on-vqa-v2-test-dev)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-dev?p=murel-multimodal-relational-reasoning-for)`

MUREL: Multimodal Relational Reasoning for Visual Question Answering

CVPR 2019 · Remi Cadene, Hedi Ben-Younes, Matthieu Cord, Nicolas Thome ·

Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although attention allows to focus on the visual content relevant to the question, this simple mechanism is arguably insufficient to model complex reasoning features required for VQA or other high-level tasks. In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images. Our first contribution is the introduction of the MuRel cell, an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations. Secondly, we incorporate the cell into a full MuRel network, which progressively refines visual and question interactions, and can be leveraged to define visualization schemes finer than mere attention maps. We validate the relevance of our approach with various ablation studies, and show its superiority to attention-based methods on three datasets: VQA 2.0, VQA-CP v2 and TDIUC. Our final MuRel network is competitive to or outperforms state-of-the-art results in this challenging context. Our code is available: https://github.com/Cadene/murel.bootstrap.pytorch

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Code

Add Remove Mark official

Cadene/murel.bootstrap.pytorch official

193

Tasks

Add Remove

Relational Reasoning

Visual Question Answering

Visual Question Answering (VQA)

Datasets

Visual Question Answering

CLEVR

Visual Question Answering v2.0

TDIUC

VQA-CP

Results from the Paper

Edit

Ranked #1 on Visual Question Answering (VQA) on TDIUC

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Question Answering (VQA)	TDIUC	Accuracy	Accuracy	88.2	# 1	Compare
Visual Question Answering (VQA)	VQA-CP	MuRel	Score	39.54	# 9	Compare
Visual Question Answering (VQA)	VQA v2 test-dev	MuRel	Accuracy	68.03	# 37	Compare
Visual Question Answering (VQA)	VQA v2 test-std	MuRel	overall	68.4	# 32	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MUREL: Multimodal Relational Reasoning for Visual Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove