CLEVR-Dialog is a large diagnostic dataset for studying multi-round reasoning in visual dialog. Specifically, that authors construct a dialog grammar that is grounded in the scene graphs of the images from the CLEVR dataset. This combination results in a dataset where all aspects of the visual dialog are fully annotated. In total, CLEVR-Dialog contains 5 instances of 10-round dialogs for about 85k CLEVR images, totaling to 4.25M question-answer pairs.
10 PAPERS • NO BENCHMARKS YET
A large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation.
VisPro dataset contains coreference annotation of 29,722 pronouns from 5,000 dialogues.
6 PAPERS • NO BENCHMARKS YET
A Game Of Sorts is a collaborative image ranking task. Players are asked to rank a set of images based on a given sorting criterion. The game provides a framework for the evaluation of visually grounded language understanding and generation of referring expressions in multimodal dialogue settings.
2 PAPERS • NO BENCHMARKS YET