TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Dialog	VisDial v0.9 val	RVA	MRR	0.6634	# 13
Visual Dialog	VisDial v0.9 val	RVA	Mean Rank	3.93	# 4
Visual Dialog	VisDial v0.9 val	RVA	R@1	52.71	# 6
Visual Dialog	VisDial v0.9 val	RVA	R@10	90.73	# 4
Visual Dialog	VisDial v0.9 val	RVA	R@5	82.97	# 5
Visual Dialog	Visual Dialog v1.0 test-std	RVA	NDCG (x 100)	55.59	# 65
Visual Dialog	Visual Dialog v1.0 test-std	RVA	MRR (x 100)	63.03	# 33
Visual Dialog	Visual Dialog v1.0 test-std	RVA	R@1	49.03	# 35
Visual Dialog	Visual Dialog v1.0 test-std	RVA	R@5	80.40	# 31
Visual Dialog	Visual Dialog v1.0 test-std	RVA	R@10	89.83	# 24
Visual Dialog	Visual Dialog v1.0 test-std	RVA	Mean	4.18	# 52

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recursive-visual-attention-in-visual-dialog/visual-dialog-on-visdial-v09-val)](https://paperswithcode.com/sota/visual-dialog-on-visdial-v09-val?p=recursive-visual-attention-in-visual-dialog)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/recursive-visual-attention-in-visual-dialog/visual-dialog-on-visual-dialog-v1-0-test-std)](https://paperswithcode.com/sota/visual-dialog-on-visual-dialog-v1-0-test-std?p=recursive-visual-attention-in-visual-dialog)`

Recursive Visual Attention in Visual Dialog

CVPR 2019 · Yulei Niu, Hanwang Zhang, Manli Zhang, Jianhong Zhang, Zhiwu Lu, Ji-Rong Wen ·

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image. It typically needs to address two major problems: (1) How to answer visually-grounded questions, which is the core challenge in visual question answering (VQA); (2) How to infer the co-reference between questions and the dialog history. An example of visual co-reference is: pronouns (\eg, ``they'') in the question (\eg, ``Are they on or off?'') are linked with nouns (\eg, ``lamps'') appearing in the dialog history (\eg, ``How many lamps are there?'') and the object grounded in the image. In this work, to resolve the visual co-reference for visual dialog, we propose a novel attention mechanism called Recursive Visual Attention (RvA). Specifically, our dialog agent browses the dialog history until the agent has sufficient confidence in the visual co-reference resolution, and refines the visual attention recursively. The quantitative and qualitative experimental results on the large-scale VisDial v0.9 and v1.0 datasets demonstrate that the proposed RvA not only outperforms the state-of-the-art methods, but also achieves reasonable recursion and interpretable attention maps without additional annotations. The code is available at \url{https://github.com/yuleiniu/rva}.

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Code

Add Remove Mark official

yuleiniu/rva official

Tasks

Add Remove

Question Answering

Visual Dialog

Visual Question Answering

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

Visual Genome

VisDial

Results from the Paper

Edit

Ranked #13 on Visual Dialog on VisDial v0.9 val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Dialog	VisDial v0.9 val	RVA	MRR	0.6634	# 13	Compare
			Mean Rank	3.93	# 4	Compare
			R@1	52.71	# 6	Compare
			R@10	90.73	# 4	Compare
			R@5	82.97	# 5	Compare
Visual Dialog	Visual Dialog v1.0 test-std	RVA	NDCG (x 100)	55.59	# 65	Compare
			MRR (x 100)	63.03	# 33	Compare
			R@1	49.03	# 35	Compare
			R@5	80.40	# 31	Compare
			R@10	89.83	# 24	Compare
			Mean	4.18	# 52	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Recursive Visual Attention in Visual Dialog

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove