TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 multiple choice	MRN	Percentage correct	66.3	# 6
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	MRN + global features	Percentage correct	61.8	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-residual-learning-for-visual-qa/visual-question-answering-on-coco-visual-1)](https://paperswithcode.com/sota/visual-question-answering-on-coco-visual-1?p=multimodal-residual-learning-for-visual-qa)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multimodal-residual-learning-for-visual-qa/visual-question-answering-on-coco-visual-4)](https://paperswithcode.com/sota/visual-question-answering-on-coco-visual-4?p=multimodal-residual-learning-for-visual-qa)`

Multimodal Residual Learning for Visual QA

NeurIPS 2016 · Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang ·

Deep neural networks continue to advance the state-of-the-art of image recognition tasks with various methods. However, applications of these methods to multimodality remain limited. We present Multimodal Residual Networks (MRN) for the multimodal residual learning of visual question-answering, which extends the idea of the deep residual learning. Unlike the deep residual learning, MRN effectively learns the joint representation from vision and language information. The main idea is to use element-wise multiplication for the joint residual mappings exploiting the residual learning of the attentional models in recent studies. Various alternative models introduced by multimodality are explored based on our study. We achieve the state-of-the-art results on the Visual QA dataset for both Open-Ended and Multiple-Choice tasks. Moreover, we introduce a novel method to visualize the attention effect of the joint representations for each learning block using back-propagation algorithm, even though the visual features are collapsed without spatial information.

PDF Abstract NeurIPS 2016 PDF NeurIPS 2016 Abstract

Code

Add Remove Mark official

jnhwkim/nips-mrn-vqa official

Tasks

Add Remove

Multiple-choice

Question Answering

Visual Question Answering

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

Results from the Paper

Edit

Ranked #6 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 1.0 multiple choice

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 multiple choice	MRN	Percentage correct	66.3	# 6		Compare
Visual Question Answering (VQA)	COCO Visual Question Answering (VQA) real images 1.0 open ended	MRN + global features	Percentage correct	61.8	# 7		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Multimodal Residual Learning for Visual QA

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove