TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Visual Question Answering	MM-Vet	LLaVA-1.5 (LVIS-Instrcut4V)	GPT-4 score	40.2	# 35
Visual Question Answering	MM-Vet	LLaVA-1.5 (LVIS-Instrcut4V)	Params	13B	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/to-see-is-to-believe-prompting-gpt-4v-for/visual-question-answering-on-mm-vet)](https://paperswithcode.com/sota/visual-question-answering-on-mm-vet?p=to-see-is-to-believe-prompting-gpt-4v-for)`

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

13 Nov 2023 · Junke Wang, Lingchen Meng, Zejia Weng, Bo He, Zuxuan Wu, Yu-Gang Jiang ·

Existing visual instruction tuning methods typically prompt large language models with textual descriptions to generate instruction-following data. Despite the promising performance achieved, these descriptions are derived from image annotations, which are oftentimes coarse-grained. Furthermore, the instructions might even contradict the visual content without observing the entire visual context. To address this challenge, we introduce a fine-grained visual instruction dataset, LVIS-Instruct4V, which contains 220K visually aligned and context-aware instructions produced by prompting the powerful GPT-4V with images from LVIS. Through experimental validation and case studies, we demonstrate that high-quality visual instructional data could improve the performance of LLaVA-1.5, a state-of-the-art large multimodal model, across a wide spectrum of benchmarks by clear margins. Notably, by simply replacing the LLaVA-Instruct with our LVIS-Instruct4V, we achieve better results than LLaVA on most challenging LMM benchmarks, e.g., LLaVA$^w$ (76.7 vs. 70.7) and MM-Vet (40.2 vs. 35.4). We release our data and model at https://github.com/X2FD/LVIS-INSTRUCT4V.

PDF Abstract

Code

Add Remove Mark official

x2fd/lvis-instruct4v official

124

h-zhao1997/cobra

↳ Quickstart in

Spaces

184

Tasks

Add Remove

Instruction Following

Visual Question Answering

Datasets

LVIS

GQA

MM-Vet

Results from the Paper

Edit

Ranked #35 on Visual Question Answering on MM-Vet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Visual Question Answering	MM-Vet	LLaVA-1.5 (LVIS-Instrcut4V)	GPT-4 score	40.2	# 35		Compare
Visual Question Answering	MM-Vet	LLaVA-1.5 (LVIS-Instrcut4V)	Params	13B	# 1		Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove