TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Object Detection	LVIS v1.0 minival	MQ-GLIP-T	AP	30.4	# 5
Zero-Shot Object Detection	LVIS v1.0 minival	MQ-GroundingDINO-T	AP	30.2	# 6
Zero-Shot Object Detection	LVIS v1.0 minival	MQ-GLIP-L	AP	43.4	# 2
Zero-Shot Object Detection	LVIS v1.0 val	MQ-GLIP-T	AP	22.6	# 4
Zero-Shot Object Detection	LVIS v1.0 val	MQ-GroundingDINO-T	AP	22.1	# 5
Zero-Shot Object Detection	LVIS v1.0 val	MQ-GLIP-L	AP	34.7	# 2
Zero-Shot Object Detection	ODinW	MQ-GLIP-L	Average Score	23.9	# 2
Few-Shot Object Detection	ODinW-13	MQ-GLIP-T	Average Score	57	# 1
Few-Shot Object Detection	ODinW-35	MQ-GLIP-T	Average Score	43	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-modal-queried-object-detection-in-the/few-shot-object-detection-on-odinw-13)](https://paperswithcode.com/sota/few-shot-object-detection-on-odinw-13?p=multi-modal-queried-object-detection-in-the)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-modal-queried-object-detection-in-the/few-shot-object-detection-on-odinw-35)](https://paperswithcode.com/sota/few-shot-object-detection-on-odinw-35?p=multi-modal-queried-object-detection-in-the)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-modal-queried-object-detection-in-the/zero-shot-object-detection-on-lvis-v1-0)](https://paperswithcode.com/sota/zero-shot-object-detection-on-lvis-v1-0?p=multi-modal-queried-object-detection-in-the)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-modal-queried-object-detection-in-the/zero-shot-object-detection-on-lvis-v1-0-val)](https://paperswithcode.com/sota/zero-shot-object-detection-on-lvis-v1-0-val?p=multi-modal-queried-object-detection-in-the)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-modal-queried-object-detection-in-the/zero-shot-object-detection-on-odinw)](https://paperswithcode.com/sota/zero-shot-object-detection-on-odinw?p=multi-modal-queried-object-detection-in-the)`

Multi-modal Queried Object Detection in the Wild

NeurIPS 2023 · Yifan Xu, Mengdan Zhang, Chaoyou Fu, Peixian Chen, Xiaoshan Yang, Ke Li, Changsheng Xu ·

We introduce MQ-Det, an efficient architecture and pre-training strategy design to utilize both textual description with open-set generalization and visual exemplars with rich description granularity as category queries, namely, Multi-modal Queried object Detection, for real-world detection with both open-vocabulary categories and various granularity. MQ-Det incorporates vision queries into existing well-established language-queried-only detectors. A plug-and-play gated class-scalable perceiver module upon the frozen detector is proposed to augment category text with class-wise visual information. To address the learning inertia problem brought by the frozen detector, a vision conditioned masked language prediction strategy is proposed. MQ-Det's simple yet effective architecture and training strategy design is compatible with most language-queried object detectors, thus yielding versatile applications. Experimental results demonstrate that multi-modal queries largely boost open-world detection. For instance, MQ-Det significantly improves the state-of-the-art open-set detector GLIP by +7.8% AP on the LVIS benchmark via multi-modal queries without any downstream finetuning, and averagely +6.3% AP on 13 few-shot downstream tasks, with merely additional 3% modulating time required by GLIP. Code is available at https://github.com/YifanXu74/MQ-Det.

PDF Abstract NeurIPS 2023 PDF NeurIPS 2023 Abstract

Code

Add Remove Mark official

yifanxu74/mq-det official

228

Tasks

Add Remove

Few-Shot Object Detection

Object

object-detection

Object Detection

Zero-Shot Object Detection

Datasets

MS COCO

LVIS

Objects365

Results from the Paper

Edit

Ranked #1 on Few-Shot Object Detection on ODinW-35

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Object Detection	LVIS v1.0 minival	MQ-GLIP-T	AP	30.4	# 5	Compare
Zero-Shot Object Detection	LVIS v1.0 minival	MQ-GroundingDINO-T	AP	30.2	# 6	Compare
Zero-Shot Object Detection	LVIS v1.0 minival	MQ-GLIP-L	AP	43.4	# 2	Compare
Zero-Shot Object Detection	LVIS v1.0 val	MQ-GLIP-T	AP	22.6	# 4	Compare
Zero-Shot Object Detection	LVIS v1.0 val	MQ-GroundingDINO-T	AP	22.1	# 5	Compare
Zero-Shot Object Detection	LVIS v1.0 val	MQ-GLIP-L	AP	34.7	# 2	Compare
Zero-Shot Object Detection	ODinW	MQ-GLIP-L	Average Score	23.9	# 2	Compare
Few-Shot Object Detection	ODinW-13	MQ-GLIP-T	Average Score	57	# 1	Compare
Few-Shot Object Detection	ODinW-35	MQ-GLIP-T	Average Score	43	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Multi-modal Queried Object Detection in the Wild

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove