VQDv1 (Visual Query Detection v1)

Introduced by Acharya et al. in VQD: Visual Query Detection in Natural Scenes

In Visual Query Detection (VQD), a system is given a query (prompt) natural language and an image, and then the system must produce 0 - N boxes that satisfy that query. VQD is related to several other tasks in computer vision, but it captures abilities these other tasks ignore. Unlike object detection, VQD can deal with attributes and relations among objects in the scene. In VQA, often algorithms produce the right answers due to dataset bias without `looking' at relevant image regions. Referring Expression Recognition (RER) datasets have short and often ambiguous prompts, and by having only a single box as an output, they make it easier to exploit dataset biases. VQD requires goal-directed object detection and outputting a variable number of boxes that answer a query.

Statistics:

In VQDv1 the number of bounding boxes per image ranges from 0-15. VQDv1 has 123K images and 621K questions, where the questions are divided into these categories: 391K Simple Questions, 172K Color Questions, and 58K Positional Reasoning Questions.

Homepage

Benchmarks

Add a new result Link an existing benchmark

Trend	Task	Dataset Variant	Best Model	Paper	Code
	Referring Expression Comprehension	VQDv1	Vision+Query

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

VQDv1 (Visual Query Detection v1)

Statistics:

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit

License Edit

Modalities Edit

Languages Edit

Benchmarks

Add a new result Link an existing benchmark

Dataset Loaders

Add Remove

Tasks

License

Modalities

Languages