VD-Ref is a dataset with ground-truth mappings from both noun phrases and pronouns to image regions. This dataset contains a set of 10k complete sets from the VisDialog dataset, and uses the StanfordCoreNLP tool to tokenize the sentences, making it proper for the succeeding human annotation.

Source: Extending Phrase Grounding with Pronouns in Visual Dialogues

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Modalities


Languages