Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Libraries

Use these libraries to find Visual Dialog models and implementations

Perceptual Score: What Data Modalities Does Your Model Perceive?

itaigat/perceptual-score NeurIPS 2021

To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.

8
27 Oct 2021

Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser

zd11024/entity_questioner Findings (EMNLP) 2021

To enhance VD Questioner: 1) we propose a Related entity enhanced Questioner (ReeQ) that generates questions under the guidance of related entities and learns entity-based questioning strategy from human dialogs; 2) we propose an Augmented Guesser (AugG) that is strong and is optimized for the VD setting especially.

3
06 Sep 2021

Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation

amazon-research/read-up CVPR 2021

Most existing work for Guesser encode the dialog history as a whole and train the Guesser models from scratch on the GuessWhat?!

7
24 May 2021

Ensemble of MRR and NDCG models for Visual Dialog

idansc/mrr-ndcg NAACL 2021

However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know.

18
15 Apr 2021

Where Are You? Localization from Embodied Dialog

meera1hahn/Graph_LED EMNLP 2020

In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.

9
16 Nov 2020

Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue

zipengxuc/ADVSE-GuessWhat 1 Oct 2020

In this paper, we propose an Answer-Driven Visual State Estimator (ADVSE) to impose the effects of different answers on visual states.

8
01 Oct 2020

SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space

xiaoxiaoheimei/SeqDialN 2 Aug 2020

IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance.

6
02 Aug 2020

Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data

mcogswell/dialog_without_dialog NeurIPS 2020

Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people?

5
24 Jul 2020

History for Visual Dialog: Do we really need it?

shubhamagarwal92/visdial_conv ACL 2020

Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.

32
08 May 2020

Multi-View Attention Network for Visual Dialog

taesunwhang/MVAN-VisDial 29 Apr 2020

To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e. g., question, dialog history, and image) is required.

44
29 Apr 2020