VQDv1 (Visual Query Detection v1)

Introduced by Acharya et al. in VQD: Visual Query Detection in Natural Scenes

In Visual Query Detection (VQD), a system is given a query (prompt) natural language and an image, and then the system must produce 0 - N boxes that satisfy that query. VQD is related to several other tasks in computer vision, but it captures abilities these other tasks ignore. Unlike object detection, VQD can deal with attributes and relations among objects in the scene. In VQA, often algorithms produce the right answers due to dataset bias without `looking' at relevant image regions. Referring Expression Recognition (RER) datasets have short and often ambiguous prompts, and by having only a single box as an output, they make it easier to exploit dataset biases. VQD requires goal-directed object detection and outputting a variable number of boxes that answer a query.

Statistics:

In VQDv1 the number of bounding boxes per image ranges from 0-15. VQDv1 has 123K images and 621K questions, where the questions are divided into these categories: 391K Simple Questions, 172K Color Questions, and 58K Positional Reasoning Questions.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


Modalities


Languages