Visual Dialog

54 papers with code • 8 benchmarks • 10 datasets

Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a follow-up question about the image, the task is to answer the question.

Libraries

Use these libraries to find Visual Dialog models and implementations

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

dvlab-research/minigemini 27 Mar 2024

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

2,829
27 Mar 2024

Collecting Visually-Grounded Dialogue with A Game Of Sorts

willemsenbram/a-game-of-sorts LREC 2022

We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call "A Game Of Sorts".

3
10 Sep 2023

PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts

AlibabaResearch/DAMO-ConvAI 24 May 2023

It utilizes a combination of several fundamental experts to accommodate multiple dialogue-related tasks and can be pre-trained using limited dialogue and extensive non-dialogue multi-modal data.

967
24 May 2023

Unified Multimodal Model with Unlikelihood Training for Visual Dialog

zihaow123/unimm 23 Nov 2022

Prior work performs the standard likelihood training for answer generation on the positive instances (involving correct answers).

13
23 Nov 2022

LAVIS: A Library for Language-Vision Intelligence

salesforce/lavis 15 Sep 2022

We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research and applications.

8,722
15 Sep 2022

Video Dialog as Conversation about Objects Living in Space-Time

hoanganhpham1006/cost 8 Jul 2022

To tackle these challenges we present a new object-centric framework for video dialog that supports neural reasoning dubbed COST - which stands for Conversation about Objects in Space-Time.

31
08 Jul 2022

VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution

hkust-knowcomp/vd-pcr 29 May 2022

In this paper, we propose VD-PCR, a novel framework to improve Visual Dialog understanding with Pronoun Coreference Resolution in both implicit and explicit ways.

8
29 May 2022

The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

gicheonkang/gst-visdial CVPR 2023

As a result, GST scales the amount of training data up to an order of magnitude that of VisDial (1. 2M to 12. 9M QA data).

17
25 May 2022

Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene

zd11024/spot_difference 16 Mar 2022

Visual dialog has witnessed great progress after introducing various vision-oriented goals into the conversation, especially such as GuessWhich and GuessWhat, where the only image is visible by either and both of the questioner and the answerer, respectively.

4
16 Mar 2022

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input

i-need-sleep/mmcoref_cleaned 7 Dec 2021

Our model ranks second in the official evaluation on the object coreference resolution task with an F1 score of 73. 3% after model ensembling.

2
07 Dec 2021