TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	VQA v2 test-dev	Image features from bottom-up attention (adaptive K, ensemble)	Accuracy	69.87	# 33
Visual Question Answering (VQA)	VQA v2 test-std	Image features from bottom-up attention (adaptive K, ensemble)	overall	70.3	# 30

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tips-and-tricks-for-visual-question-answering/visual-question-answering-on-vqa-v2-test-std)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-std?p=tips-and-tricks-for-visual-question-answering)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tips-and-tricks-for-visual-question-answering/visual-question-answering-on-vqa-v2-test-dev)](https://paperswithcode.com/sota/visual-question-answering-on-vqa-v2-test-dev?p=tips-and-tricks-for-visual-question-answering)`

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

CVPR 2018 · Damien Teney, Peter Anderson, Xiaodong He, Anton Van Den Hengel ·

This paper presents a state-of-the-art model for visual question answering (VQA), which won the first place in the 2017 VQA Challenge. VQA is a task of significant importance for research in artificial intelligence, given its multimodal nature, clear evaluation protocol, and potential real-world applications. The performance of deep neural networks for VQA is very dependent on choices of architectures and hyperparameters. To help further research in the area, we describe in detail our high-performing, though relatively simple model. Through a massive exploration of architectures and hyperparameters representing more than 3,000 GPU-hours, we identified tips and tricks that lead to its success, namely: sigmoid outputs, soft training targets, image features from bottom-up attention, gated tanh activations, output embeddings initialized using GloVe and Google Images, large mini-batches, and smart shuffling of training data. We provide a detailed analysis of their impact on performance to assist others in making an appropriate selection.

PDF Abstract CVPR 2018 PDF CVPR 2018 Abstract

Code

Add Remove Mark official

peteanderson80/bottom-up-attention

1,402

hengyuan-hu/bottom-up-attention-vqa

745

SinghJasdeep/Attention-on-Attention…

130

shailzajolly/icdar_vqa

yangdsh/VQA-BUTD-demo

See all 10 implementations

Tasks

Add Remove

Visual Question Answering

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

Visual Genome

Visual Question Answering v2.0

Results from the Paper

Edit

Ranked #30 on Visual Question Answering (VQA) on VQA v2 test-std

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering (VQA)	VQA v2 test-dev	Image features from bottom-up attention (adaptive K, ensemble)	Accuracy	69.87	# 33		Compare
Visual Question Answering (VQA)	VQA v2 test-std	Image features from bottom-up attention (adaptive K, ensemble)	overall	70.3	# 30		Compare

Methods

Add Remove

GloVe

Edit Social Preview

Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove