Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Dialog

Dataset	Best Model	Compare
VisDial v0.9 val	9xFGA (VGG)	See all
Visual Dialog v1.0 test-std	Single	See all
VisDial v1.0 test-std	5xFGA + LS*+	See all
ConvAI2	Multi-Modal BlenderBot	See all
EmpatheticDialogues	Multi-Modal BlenderBot	See all
Wizard of Wikipedia	Multi-Modal BlenderBot	See all
BlendedSkillTalk	Multi-Modal BlenderBot	See all
Image-Chat	Multi-Modal BlenderBot	See all

Libraries

Use these libraries to find Visual Dialog models and implementations

naver/aqm-plus

3 papers

kdexd/lang-emerge-parlai

2 papers

105

zihaow123/unimm

2 papers

Datasets

Most implemented papers

Most implemented Social Latest No code

Recursive Visual Attention in Visual Dialog

yuleiniu/rva • • CVPR 2019

Visual dialog is a challenging vision-language task, which requires the agent to answer multi-round questions about an image.

Paper
Code

Large-Scale Answerer in Questioner's Mind for Visual Dialog Question Generation

naver/aqm-plus • • ICLR 2019

Answerer in Questioner's Mind (AQM) is an information-theoretic framework that has been recently proposed for task-oriented dialog systems.

Paper
Code

Discourse Parsing in Videos: A Multi-modal Appraoch

arjunakula/Visual-Discourse-Parsing • • 6 Mar 2019

We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video.

Paper
Code

CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog

satwikkottur/clevr-dialog • NAACL 2019

Specifically, we construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset.

Paper
Code

Factor Graph Attention

idansc/fga • • CVPR 2019

We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities.

Paper
Code

Reasoning Visual Dialogs with Structural and Partial Observations

zilongzheng/visdial-gnn • • CVPR 2019

The answer to a given question is represented by a node with missing value.

Paper
Code

Improving Generative Visual Dialog by Answering Diverse Questions

vmurahari3/visdial-diversity • • IJCNLP 2019

Prior work on training generative Visual Dialog models with reinforcement learning(Das et al.) has explored a Qbot-Abot image-guessing game and shown that this 'self-talk' approach can lead to improved performance at the downstream dialog-conditioned image-guessing task.

Paper
Code

TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines

Deanplayerljx/tab-vcr • • NeurIPS 2019

Despite impressive recent progress that has been reported on tasks that necessitate reasoning, such as visual question answering and visual dialog, models often exploit biases in datasets.

Paper
Code

DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue

JXZe/DualVD • • 17 Nov 2019

More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values.

Paper
Code

An Annotated Corpus of Reference Resolution for Interpreting Common Grounding

Alab-NII/onecommon • • 18 Nov 2019

Common grounding is the process of creating, repairing and updating mutual understandings, which is a fundamental aspect of natural language conversation.

Paper
Code

Visual Dialog

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result