TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering (VQA)	A-OKVQA	A Simple Baseline for KB-VQA	DA VQA Score	57.5	# 5
Visual Question Answering (VQA)	OK-VQA	A Simple Baseline for KB-VQA	Accuracy	61.2	# 7

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-baseline-for-knowledge-based-visual/visual-question-answering-on-a-okvqa)](https://paperswithcode.com/sota/visual-question-answering-on-a-okvqa?p=a-simple-baseline-for-knowledge-based-visual)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-simple-baseline-for-knowledge-based-visual/visual-question-answering-on-ok-vqa)](https://paperswithcode.com/sota/visual-question-answering-on-ok-vqa?p=a-simple-baseline-for-knowledge-based-visual)`

A Simple Baseline for Knowledge-Based Visual Question Answering

20 Oct 2023 · Alexandros Xenos, Themos Stafylakis, Ioannis Patras, Georgios Tzimiropoulos ·

This paper is on the problem of Knowledge-Based Visual Question Answering (KB-VQA). Recent works have emphasized the significance of incorporating both explicit (through external databases) and implicit (through LLMs) knowledge to answer questions requiring external knowledge effectively. A common limitation of such approaches is that they consist of relatively complicated pipelines and often heavily rely on accessing GPT-3 API. Our main contribution in this paper is to propose a much simpler and readily reproducible pipeline which, in a nutshell, is based on efficient in-context learning by prompting LLaMA (1 and 2) using question-informative captions as contextual information. Contrary to recent approaches, our method is training-free, does not require access to external databases or APIs, and yet achieves state-of-the-art accuracy on the OK-VQA and A-OK-VQA datasets. Finally, we perform several ablation studies to understand important aspects of our method. Our code is publicly available at https://github.com/alexandrosXe/ASimple-Baseline-For-Knowledge-Based-VQA

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

In-Context Learning

Question Answering

Visual Question Answering

Visual Question Answering (VQA)

Datasets

OK-VQA

A-OKVQA

Results from the Paper

Add Remove

Ranked #5 on Visual Question Answering (VQA) on A-OKVQA (DA VQA Score metric)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Question Answering (VQA)	A-OKVQA	A Simple Baseline for KB-VQA	DA VQA Score	57.5	# 5	Compare
Visual Question Answering (VQA)	OK-VQA	A Simple Baseline for KB-VQA	Accuracy	61.2	# 7	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • LLaMA • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

A Simple Baseline for Knowledge-Based Visual Question Answering

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove