TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	Precision@0.5	0.843	# 2
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	Precision@0.9	0.259	# 1
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	IoU overall	0.799	# 2
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	IoU mean	0.720	# 2
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	Precision@0.6	0.822	# 2
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	Precision@0.7	0.767	# 1
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	Precision@0.8	0.617	# 1
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	AP	0.585	# 1
Referring Expression Segmentation	DAVIS 2017 (val)	SgMg	J&F 1st frame	63.3	# 3
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	Precision@0.5	0.972	# 1
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	Precision@0.6	0.917	# 1
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	Precision@0.7	0.714	# 1
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	Precision@0.8	0.225	# 1
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	Precision@0.9	0.003	# 3
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	AP	0.450	# 1
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	IoU overall	0.737	# 1
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	IoU mean	0.725	# 1
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	SgMg (Pre-training, Video-Swin-B)	J&F	65.7	# 9
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	SgMg (Pre-training, Video-Swin-B)	J	63.9	# 8
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	SgMg (Pre-training, Video-Swin-B)	F	67.4	# 8

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spectrum-guided-multi-granularity-referring/referring-expression-segmentation-on-a2d)](https://paperswithcode.com/sota/referring-expression-segmentation-on-a2d?p=spectrum-guided-multi-granularity-referring)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spectrum-guided-multi-granularity-referring/referring-expression-segmentation-on-j-hmdb)](https://paperswithcode.com/sota/referring-expression-segmentation-on-j-hmdb?p=spectrum-guided-multi-granularity-referring)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spectrum-guided-multi-granularity-referring/referring-expression-segmentation-on-davis)](https://paperswithcode.com/sota/referring-expression-segmentation-on-davis?p=spectrum-guided-multi-granularity-referring)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/spectrum-guided-multi-granularity-referring/referring-expression-segmentation-on-refer-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refer-1?p=spectrum-guided-multi-granularity-referring)`

Spectrum-guided Multi-granularity Referring Video Object Segmentation

ICCV 2023 · Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian ·

Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (low-resolution) vision-language features to segment the decoded high-resolution features. We discovered that this causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation. This negatively affects the ability of segmentation kernels. To address the drift problem, we propose a Spectrum-guided Multi-granularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks. In addition, we propose Spectrum-guided Cross-modal Fusion (SCF) to perform intra-frame global interactions in the spectral domain for effective multimodal representation. Finally, we extend SgMg to perform multi-object R-VOS, a new paradigm that enables simultaneous segmentation of multiple referred objects in a video. This not only makes R-VOS faster, but also more practical. Extensive experiments show that SgMg achieves state-of-the-art performance on four video benchmark datasets, outperforming the nearest competitor by 2.8% points on Ref-YouTube-VOS. Our extended SgMg enables multi-object R-VOS, runs about 3 times faster while maintaining satisfactory performance. Code is available at https://github.com/bo-miao/SgMg.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

bo-miao/sgmg official

Tasks

Add Remove

Object

Referring Expression Segmentation

Referring Video Object Segmentation

Segmentation

Semantic Segmentation

Video Object Segmentation

Video Semantic Segmentation

Datasets

DAVIS 2017

JHMDB

Referring Expressions for DAVIS 2016 & 2017

Refer-YouTube-VOS

A2D Sentences

Results from the Paper

Edit

Ranked #1 on Referring Expression Segmentation on J-HMDB (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	A2D Sentences	SgMg (Video-Swin-B)	Precision@0.5	0.843	# 2	Compare
			Precision@0.9	0.259	# 1	Compare
			IoU overall	0.799	# 2	Compare
			IoU mean	0.720	# 2	Compare
			Precision@0.6	0.822	# 2	Compare
			Precision@0.7	0.767	# 1	Compare
			Precision@0.8	0.617	# 1	Compare
			AP	0.585	# 1	Compare
Referring Expression Segmentation	DAVIS 2017 (val)	SgMg	J&F 1st frame	63.3	# 3	Compare
Referring Expression Segmentation	J-HMDB	SgMg (Video-Swin-B)	Precision@0.5	0.972	# 1	Compare
			Precision@0.6	0.917	# 1	Compare
			Precision@0.7	0.714	# 1	Compare
			Precision@0.8	0.225	# 1	Compare
			Precision@0.9	0.003	# 3	Compare
			AP	0.450	# 1	Compare
			IoU overall	0.737	# 1	Compare
			IoU mean	0.725	# 1	Compare
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	SgMg (Pre-training, Video-Swin-B)	J&F	65.7	# 9	Compare
			J	63.9	# 8	Compare
			F	67.4	# 8	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove