Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Dialog

Dataset	Best Model	Compare
VisDial v0.9 val	9xFGA (VGG)	See all
Visual Dialog v1.0 test-std	Single	See all
VisDial v1.0 test-std	5xFGA + LS*+	See all
ConvAI2	Multi-Modal BlenderBot	See all
EmpatheticDialogues	Multi-Modal BlenderBot	See all
Wizard of Wikipedia	Multi-Modal BlenderBot	See all
BlendedSkillTalk	Multi-Modal BlenderBot	See all
Image-Chat	Multi-Modal BlenderBot	See all

Libraries

Use these libraries to find Visual Dialog models and implementations

naver/aqm-plus

3 papers

kdexd/lang-emerge-parlai

2 papers

105

zihaow123/unimm

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Visual Dialog

batra-mlp-lab/visdial-amt-chat • CVPR 2017

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content.

Paper
Code

Hierarchical Question-Image Co-Attention for Visual Question Answering

jiasenlu/HieCoAttenVQA • • NeurIPS 2016

In addition, our model reasons about the question (and consequently the image via the co-attention mechanism) in a hierarchical fashion via a novel 1-dimensional convolution neural networks (CNN).

Paper
Code

Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning

sea-snell/implicit-language-q-learning • • ICCV 2017

Specifically, we pose a cooperative 'image guessing' game between two agents -- Qbot and Abot -- who communicate in natural language dialog so that Qbot can select an unseen image from a lineup of images.

Paper
Code

Audio Visual Scene-Aware Dialog (AVSD) Challenge at DSTC7

hudaAlamri/DSTC7-Audio-Visual-Scene-Aware-Dialog-AVSD-Challenge • • 1 Jun 2018

Scene-aware dialog systems will be able to have conversations with users about the objects and events around them.

Paper
Code

Visual Dialogue without Vision or Dialogue

danielamassiceti/CCA-visualdialogue • • 16 Dec 2018

We characterise some of the quirks and shortcomings in the exploration of Visual Dialogue - a sequential question-answering task where the questions and corresponding answers are related through given visual stimuli.

Paper
Code

Dual Attention Networks for Visual Reference Resolution in Visual Dialog

gicheonkang/DAN-VisDial • • IJCNLP 2019

Specifically, REFER module learns latent relationships between a given question and a dialog history by employing a self-attention mechanism.

Paper
Code

Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline

vmurahari3/visdial-bert • • ECCV 2020

Next, we find that additional finetuning using "dense" annotations in VisDial leads to even higher NDCG -- more than 10% over our base model -- but hurts MRR -- more than 17% below our base model!

Paper
Code

History for Visual Dialog: Do we really need it?

shubhamagarwal92/visdial_conv • • ACL 2020

Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.

Paper
Code

Where Are You? Localization from Embodied Dialog

meera1hahn/Graph_LED • • EMNLP 2020

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

Paper
Code

The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

gicheonkang/gst-visdial • • CVPR 2023

As a result, GST scales the amount of training data up to an order of magnitude that of VisDial (1. 2M to 12. 9M QA data).

Paper
Code

Visual Dialog

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result