Visual Dialog
54 papers with code • 8 benchmarks • 10 datasets
Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.
Libraries
Use these libraries to find Visual Dialog models and implementationsDatasets
Latest papers
Perceptual Score: What Data Modalities Does Your Model Perceive?
To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.
Enhancing Visual Dialog Questioner with Entity-based Strategy Learning and Augmented Guesser
To enhance VD Questioner: 1) we propose a Related entity enhanced Questioner (ReeQ) that generates questions under the guidance of related entities and learns entity-based questioning strategy from human dialogs; 2) we propose an Augmented Guesser (AugG) that is strong and is optimized for the VD setting especially.
Learning Better Visual Dialog Agents with Pretrained Visual-Linguistic Representation
Most existing work for Guesser encode the dialog history as a whole and train the Guesser models from scratch on the GuessWhat?!
Ensemble of MRR and NDCG models for Visual Dialog
However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know.
Where Are You? Localization from Embodied Dialog
In this paper, we focus on the LED task -- providing a strong baseline model with detailed ablations characterizing both dataset biases and the importance of various modeling choices.
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue
In this paper, we propose an Answer-Driven Visual State Estimator (ADVSE) to impose the effects of different answers on visual states.
SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space
IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance.
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people?
History for Visual Dialog: Do we really need it?
Visual Dialog involves "understanding" the dialog history (what has been discussed previously) and the current question (what is asked), in addition to grounding information in the image, to generate the correct response.
Multi-View Attention Network for Visual Dialog
To resolve the visual dialog task, a high-level understanding of various multimodal inputs (e. g., question, dialog history, and image) is required.