Described Object Detection

8 papers with code • 1 benchmarks • 1 datasets

Described Object Detection (DOD) detects all instances on each image in the dataset, based on a flexible reference. It is a superset of Open-Vocabulary Object Detection (OVD) and Referring Expression Comprehension (REC). It expands category names to flexible language expressions for OVD and overcomes the limitation of REC only grounding the pre-existing object. Works related to DOD are tracked in awesome-DOD list on github.

Benchmarks

Add a Result

These leaderboards are used to track progress in Described Object Detection

Trend	Dataset	Best Model	Paper	Code	Compare
	Description Detection Dataset	MM-Grounding-DINO			See all

Datasets

Description Detection Dataset

Most implemented papers

Most implemented Social Latest No code

Grounded Language-Image Pre-training

microsoft/GLIP • • CVPR 2022

The unification brings two benefits: 1) it allows GLIP to learn from both detection and grounding data to improve both tasks and bootstrap a good grounding model; 2) GLIP can leverage massive image-text pairs by generating grounding boxes in a self-training fashion, making the learned representation semantic-rich.

Paper
Code

Simple Open-Vocabulary Object Detection with Vision Transformers

google-research/scenic • • 12 May 2022

Combining simple architectures with large-scale pre-training has led to massive improvements in image classification.

Paper
Code

Described Object Detection: Liberating Object Detection with Flexible Expressions

charles-xie/awesome-described-object-detection • • NeurIPS 2023

In this paper, we advance them to a more practical setting called Described Object Detection (DOD) by expanding category names to flexible language expressions for OVD and overcoming the limitation of REC only grounding the pre-existing object.

Paper
Code

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

microsoft/fiber • • NeurIPS 2022

Vision-language (VL) pre-training has recently received considerable attention.

Paper
Code

Universal Instance Perception as Object Discovery and Retrieval

MasterBin-IIAU/UNINEXT • • CVPR 2023

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks.

Paper
Code

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

tgxs002/cora • • CVPR 2023

To overcome these obstacles, we propose CORA, a DETR-style framework that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching.

Paper
Code

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

alpha-vllm/llama2-accessory • • 13 Nov 2023

We present SPHINX, a versatile multi-modal large language model (MLLM) with a joint mixing of model weights, tuning tasks, and visual embeddings.

Paper
Code

An Open and Comprehensive Pipeline for Unified Object Grounding and Detection

open-mmlab/mmdetection • • 4 Jan 2024

Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC).

Paper
Code

Described Object Detection

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result