Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Dialog

Dataset	Best Model	Compare
VisDial v0.9 val	9xFGA (VGG)	See all
Visual Dialog v1.0 test-std	Single	See all
VisDial v1.0 test-std	5xFGA + LS*+	See all
ConvAI2	Multi-Modal BlenderBot	See all
EmpatheticDialogues	Multi-Modal BlenderBot	See all
Wizard of Wikipedia	Multi-Modal BlenderBot	See all
BlendedSkillTalk	Multi-Modal BlenderBot	See all
Image-Chat	Multi-Modal BlenderBot	See all

Libraries

Use these libraries to find Visual Dialog models and implementations

naver/aqm-plus

3 papers

kdexd/lang-emerge-parlai

2 papers

105

zihaow123/unimm

2 papers

Datasets

Latest papers with no code

Most implemented Social Latest No code

Discourse Analysis for Evaluating Coherence in Video Paragraph Captions

no code yet • 17 Jan 2022

We also introduce DisNet, a novel dataset containing the proposed visual discourse annotations of 3000 videos and their paragraphs.

Paper
Add Code

How to Fool Systems and Humans in Visually Grounded Interaction: A Case Study on Adversarial Attacks on Visual Dialog

no code yet • ACL ARR January 2022

Adversarial attacks change predictions of deep neural network models, while aiming to remain unnoticed by the user. This is a challenge for textual attacks, which target discrete text.

Paper
Add Code

ViDA-MAN: Visual Dialog with Digital Humans

no code yet • 26 Oct 2021

We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries.

Paper
Add Code

Evaluating and Improving Interactions with Hazy Oracles

no code yet • 19 Oct 2021

Many AI systems integrate sensor inputs, world knowledge, and human-provided information to perform inference.

Paper
Add Code

Variational Disentangled Attention for Regularized Visual Dialog

no code yet • 29 Sep 2021

One of the most important challenges in a visual dialog is to effectively extract the information from a given image and its historical conversation which are related to the current question.

Paper
Add Code

GoG: Relation-aware Graph-over-Graph Network for Visual Dialog

no code yet • Findings (ACL) 2021

Specifically, GoG consists of three sequential graphs: 1) H-Graph, which aims to capture coreference relations among dialog history; 2) History-aware Q-Graph, which aims to fully understand the question through capturing dependency relations between words based on coreference resolution on the dialog history; and 3) Question-aware I-Graph, which aims to capture the relations between objects in an image based on fully question representation.

Paper
Add Code

Learning to Ground Visual Objects for Visual Dialog

no code yet • Findings (EMNLP) 2021

Specifically, a posterior distribution over visual objects is inferred from both context (history and questions) and answers, and it ensures the appropriate grounding of visual objects during the training process.

Paper
Add Code

Visual-Textual Alignment for Graph Inference in Visual Dialog

no code yet • COLING 2020

As a conversational intelligence task, visual dialog entails answering a series of questions grounded in an image, using the dialog history as context.

Paper
Add Code

Reasoning Over History: Context Aware Visual Dialog

no code yet • EMNLP (nlpbt) 2020

While neural models have been shown to exhibit strong performance on single-turn visual question answering (VQA) tasks, extending VQA to a multi-turn, conversational setting remains a challenge.

Paper
Add Code

Multi-Modal Open-Domain Dialogue

no code yet • EMNLP 2021

Recent work in open-domain conversational agents has demonstrated that significant improvements in model engagingness and humanness metrics can be achieved via massive scaling in both pre-training data and model size (Adiwardana et al., 2020; Roller et al., 2020).

Paper
Add Code

Visual Dialog

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result