TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Question Answering (VQA)	VizWiz 2018	B-Ultra	overall	53.68	# 4
Visual Question Answering (VQA)	VizWiz 2018	B-Ultra	yes/no	68.12	# 5
Visual Question Answering (VQA)	VizWiz 2018	B-Ultra	number	28.81	# 1
Visual Question Answering (VQA)	VizWiz 2018	B-Ultra	other	35.41	# 3
Visual Question Answering (VQA)	VizWiz 2018	B-Ultra	unanswerable	84.03	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/decoupled-box-proposal-and-featurization-with/visual-question-answering-on-vizwiz-2018-1)](https://paperswithcode.com/sota/visual-question-answering-on-vizwiz-2018-1?p=decoupled-box-proposal-and-featurization-with)`

Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering

IJCNLP 2019 · Soravit Changpinyo, Bo Pang, Piyush Sharma, Radu Soricut ·

Object detection plays an important role in current solutions to vision and language tasks like image captioning and visual question answering. However, popular models like Faster R-CNN rely on a costly process of annotating ground-truths for both the bounding boxes and their corresponding semantic labels, making it less amenable as a primitive task for transfer learning. In this paper, we examine the effect of decoupling box proposal and featurization for down-stream tasks. The key insight is that this allows us to leverage a large amount of labeled annotations that were previously unavailable for standard object detection benchmarks. Empirically, we demonstrate that this leads to effective transfer learning and improved image captioning and visual question answering models, as measured on publicly available benchmarks.

PDF Abstract IJCNLP 2019 PDF IJCNLP 2019 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Image Captioning

Object

object-detection

Object Detection

Question Answering

Transfer Learning

Visual Question Answering

Visual Question Answering (VQA)

Datasets

MS COCO

Visual Question Answering

Visual Genome

Conceptual Captions

VizWiz

Results from the Paper

Edit

Ranked #4 on Visual Question Answering (VQA) on VizWiz 2018

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Question Answering (VQA)	VizWiz 2018	B-Ultra	overall	53.68	# 4	Compare
			yes/no	68.12	# 5	Compare
			number	28.81	# 1	Compare
			other	35.41	# 3	Compare
			unanswerable	84.03	# 2	Compare

Methods

Add Remove

Convolution • Faster R-CNN • RoIPool • RPN • Softmax

Edit Social Preview

Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove