Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Dialog

Dataset	Best Model	Compare
VisDial v0.9 val	9xFGA (VGG)	See all
Visual Dialog v1.0 test-std	Single	See all
VisDial v1.0 test-std	5xFGA + LS*+	See all
ConvAI2	Multi-Modal BlenderBot	See all
EmpatheticDialogues	Multi-Modal BlenderBot	See all
Wizard of Wikipedia	Multi-Modal BlenderBot	See all
BlendedSkillTalk	Multi-Modal BlenderBot	See all
Image-Chat	Multi-Modal BlenderBot	See all

Libraries

Use these libraries to find Visual Dialog models and implementations

naver/aqm-plus

3 papers

kdexd/lang-emerge-parlai

2 papers

105

zihaow123/unimm

2 papers

Datasets

Latest papers

Most implemented Social Latest No code

Perceptual Score: What Data Modalities Does Your Model Perceive?

itaigat/perceptual-score • • NeurIPS 2021

To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.

27 Oct 2021

Paper
Code

Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser

zd11024/entity_questioner • • Findings (EMNLP) 2021

To enhance VD Questioner: 1) we propose a Related entity enhanced Questioner (ReeQ) that generates questions under the guidance of related entities and learns entity-based questioning strategy from human dialogs; 2) we propose an Augmented Guesser (AugG) that is strong and is optimized for the VD setting especially.

06 Sep 2021

Paper
Code

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation

amazon-research/read-up • • CVPR 2021

Most existing work for Guesser encode the dialog history as a whole and train the Guesser models from scratch on the GuessWhat?!

24 May 2021

Paper
Code

Ensemble of MRR and NDCG models for Visual Dialog

idansc/mrr-ndcg • • NAACL 2021

However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know.

15 Apr 2021

Paper
Code

Where Are You? Localization from Embodied Dialog

meera1hahn/Graph_LED • • EMNLP 2020

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

16 Nov 2020

Paper
Code

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

zipengxuc/ADVSE-GuessWhat • • 1 Oct 2020

In this paper, we propose an Answer-Driven Visual State Estimator (ADVSE) to impose the effects of different answers on visual states.

01 Oct 2020

Paper
Code

SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space

xiaoxiaoheimei/SeqDialN • • 2 Aug 2020

IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance.

02 Aug 2020

Paper
Code

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

mcogswell/dialog_without_dialog • • NeurIPS 2020

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people?

24 Jul 2020

Paper
Code

History for Visual Dialog: Do we really need it?

shubhamagarwal92/visdial_conv • • ACL 2020

Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.

08 May 2020

Paper
Code

Multi-View Attention Network for Visual Dialog

taesunwhang/MVAN-VisDial • • 29 Apr 2020

To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e. g., question, dialog history, and image) is required.

29 Apr 2020

Paper
Code

Visual Dialog

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result