Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Dialog

Dataset	Best Model	Compare
VisDial v0.9 val	9xFGA (VGG)	See all
Visual Dialog v1.0 test-std	Single	See all
VisDial v1.0 test-std	5xFGA + LS*+	See all
ConvAI2	Multi-Modal BlenderBot	See all
EmpatheticDialogues	Multi-Modal BlenderBot	See all
Wizard of Wikipedia	Multi-Modal BlenderBot	See all
BlendedSkillTalk	Multi-Modal BlenderBot	See all
Image-Chat	Multi-Modal BlenderBot	See all

Libraries

Use these libraries to find Visual Dialog models and implementations

naver/aqm-plus

3 papers

kdexd/lang-emerge-parlai

2 papers

105

zihaow123/unimm

2 papers

Datasets

Latest papers

Most implemented Social Latest No code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini • • 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

2,829

27 Mar 2024

Paper
Code

Collecting Visually-Grounded Dialogue with A Game Of Sorts

willemsenbram/a-game-of-sorts • LREC 2022

We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call "A Game Of Sorts".

10 Sep 2023

Paper
Code

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

AlibabaResearch/DAMO-ConvAI • • 24 May 2023

It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data.

967

24 May 2023

Paper
Code

Unified Multimodal Model with Unlikelihood Training for Visual Dialog

zihaow123/unimm • • 23 Nov 2022

Prior work performs the standard likelihood training for answer generation on the positive instances (involving correct answers).

23 Nov 2022

Paper
Code

LAVIS: A Library for Language-Vision Intelligence

salesforce/lavis • • 15 Sep 2022

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

8,722

15 Sep 2022

Paper
Code

Video Dialog as Conversation about Objects Living in Space-Time

hoanganhpham1006/cost • • 8 Jul 2022

To tackle these challenges we present a new object-centric framework for video dialog that supports neural reasoning dubbed COST - which stands for Conversation about Objects in Space-Time.

08 Jul 2022

Paper
Code

VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution

hkust-knowcomp/vd-pcr • • 29 May 2022

In this paper, we propose VD-PCR, a novel framework to improve Visual Dialog understanding with Pronoun Coreference Resolution in both implicit and explicit ways.

29 May 2022

Paper
Code

The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

gicheonkang/gst-visdial • • CVPR 2023

As a result, GST scales the amount of training data up to an order of magnitude that of VisDial (1. 2M to 12. 9M QA data).

25 May 2022

Paper
Code

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

zd11024/spot_difference • • 16 Mar 2022

Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively.

16 Mar 2022

Paper
Code

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input

i-need-sleep/mmcoref_cleaned • • 7 Dec 2021

Our model ranks second in the official evaluation on the object coreference resolution task with an F1 score of 73. 3% after model ensembling.

07 Dec 2021

Paper
Code

Visual Dialog

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result