Description Detection Dataset

Description Detection Dataset ($D^3$, /dikju:b/) is an attempt at creating a next-generation object detection dataset. Unlike traditional detection datasets, the class names of the objects are no longer simple nouns or noun phrases, but rather complex and descriptive, such as a dog not being held by a leash. For each image in the dataset, any object that matches the description is annotated. The dataset provides annotations such as bounding boxes and finely crafted instance masks.It comprises of 422 well-designed descriptions and 24,282 positive object-description pairs.

The dataset is meant for the Described Object Detection (DOD) task. OVD detects object based on category name, and each category can have zero to multiple instances; REC grounds one region based on a language description, whether the object truly exits or not; DOD detects all instances on each image in the dataset, based on a flexible reference.

Homepage