TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Object Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	Mean Jaccard & F-Measure	76.2	# 14
Video Object Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	Jaccard	72.8	# 16
Video Object Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	F-measure	79.5	# 15
Referring Expression Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	J&F 1st frame	59.4?	# 14
Referring Expression Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	J&F Full video	59.4	# 1
Video Instance Segmentation	OVIS validation	UniVS(Swin-L)	mask AP	41.7	# 16
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	UniVS(Swin-L)	J&F	58.0	# 17
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	UniVS(Swin-L)	J	56.8	# 16
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	UniVS(Swin-L)	F	59.5	# 16
Video Panoptic Segmentation	VIPSeg	UniVS(Swin-L)	VPQ	49.3	# 7
Video Panoptic Segmentation	VIPSeg	UniVS(Swin-L)	STQ	58.2	# 1
Video Semantic Segmentation	VSPW	UniVS(Swin-L)	mIoU	59.8	# 2
Video Instance Segmentation	YouTube-VIS 2021	UniVS(Swin-L)	mask AP	57.9	# 10
Video Instance Segmentation	YouTube-VIS 2021	UniVS(Swin-L)	AP50	79.4	# 12
Video Instance Segmentation	YouTube-VIS 2021	UniVS(Swin-L)	AP75	63.3	# 11
Video Instance Segmentation	YouTube-VIS 2021	UniVS(Swin-L)	AR10	63.1	# 10
Video Instance Segmentation	YouTube-VIS 2021	UniVS(Swin-L)	AR1	46.2	# 11
Video Instance Segmentation	YouTube-VIS validation	UniVS(Swin-L)	mask AP	60.0	# 15
Video Instance Segmentation	YouTube-VIS validation	UniVS(Swin-L)	AP50	82.1	# 14
Video Instance Segmentation	YouTube-VIS validation	UniVS(Swin-L)	AP75	65.3	# 17
Video Instance Segmentation	YouTube-VIS validation	UniVS(Swin-L)	AR1	54.7	# 11
Video Instance Segmentation	YouTube-VIS validation	UniVS(Swin-L)	AR10	66.8	# 10
Video Object Segmentation	YouTube-VOS 2018	UniVS(Swin-L)	Mean Jaccard & F-Measure	71.5	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/video-semantic-segmentation-on-vspw)](https://paperswithcode.com/sota/video-semantic-segmentation-on-vspw?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/video-panoptic-segmentation-on-vipseg)](https://paperswithcode.com/sota/video-panoptic-segmentation-on-vipseg?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/video-instance-segmentation-on-youtube-vis-2)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/video-object-segmentation-on-youtube-vos-1)](https://paperswithcode.com/sota/video-object-segmentation-on-youtube-vos-1?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/video-object-segmentation-on-davis-2017-val)](https://paperswithcode.com/sota/video-object-segmentation-on-davis-2017-val?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/referring-expression-segmentation-on-davis)](https://paperswithcode.com/sota/referring-expression-segmentation-on-davis?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=univs-unified-and-universal-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/univs-unified-and-universal-video/referring-expression-segmentation-on-refer-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refer-1?p=univs-unified-and-universal-video)`

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

28 Feb 2024 · Minghan Li, Shuai Li, Xindong Zhang, Lei Zhang ·

Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge. This is mainly because generic category-specified VS tasks need to detect all objects and track them across consecutive frames, while prompt-guided VS tasks require re-identifying the target with visual/text prompts throughout the entire video, making it hard to handle the different tasks with the same architecture. We make an attempt to address these issues and present a novel unified VS architecture, namely UniVS, by using prompts as queries. UniVS averages the prompt features of the target from previous frames as its initial query to explicitly decode masks, and introduces a target-wise prompt cross-attention layer in the mask decoder to integrate prompt features in the memory pool. By taking the predicted masks of entities from previous frames as their visual prompts, UniVS converts different VS tasks into prompt-guided target segmentation, eliminating the heuristic inter-frame matching process. Our framework not only unifies the different VS tasks but also naturally achieves universal training and testing, ensuring robust performance across different scenarios. UniVS shows a commendable balance between performance and universality on 10 challenging VS benchmarks, covering video instance, semantic, panoptic, object, and referring segmentation tasks. Code can be found at \url{https://github.com/MinghanLi/UniVS}.

PDF Abstract

Code

Add Remove Mark official

minghanli/univs official

119

Tasks

Add Remove

Referring Expression Segmentation

Referring Video Object Segmentation

Video Instance Segmentation

Video Object Segmentation

Video Object Tracking

Video Panoptic Segmentation

Video Segmentation

Video Semantic Segmentation

Zero-Shot Video Object Segmentation

Datasets

DAVIS

DAVIS 2017

YouTube-VOS 2018

YouTube-VIS 2019

Referring Expressions for DAVIS 2016 & 2017

OVIS YouTube-VIS 2021

Refer-YouTube-VOS VSPW VIPSeg

Results from the Paper

Add Remove

Ranked #2 on Video Semantic Segmentation on VSPW (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Object Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	Mean Jaccard & F-Measure	76.2	# 14	Compare
			Jaccard	72.8	# 16	Compare
			F-measure	79.5	# 15	Compare
Referring Expression Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	J&F 1st frame	59.4?	# 14	Compare
Referring Expression Segmentation	DAVIS 2017 (val)	UniVS(Swin-L)	J&F Full video	59.4	# 1	Compare
Video Instance Segmentation	OVIS validation	UniVS(Swin-L)	mask AP	41.7	# 16	Compare
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	UniVS(Swin-L)	J&F	58.0	# 17	Compare
			J	56.8	# 16	Compare
			F	59.5	# 16	Compare
Video Panoptic Segmentation	VIPSeg	UniVS(Swin-L)	VPQ	49.3	# 7	Compare
Video Panoptic Segmentation	VIPSeg	UniVS(Swin-L)	STQ	58.2	# 1	Compare
Video Semantic Segmentation	VSPW	UniVS(Swin-L)	mIoU	59.8	# 2	Compare
Video Instance Segmentation	YouTube-VIS 2021	UniVS(Swin-L)	mask AP	57.9	# 10	Compare
			AP50	79.4	# 12	Compare
			AP75	63.3	# 11	Compare
			AR10	63.1	# 10	Compare
			AR1	46.2	# 11	Compare
Video Instance Segmentation	YouTube-VIS validation	UniVS(Swin-L)	mask AP	60.0	# 15	Compare
			AP50	82.1	# 14	Compare
			AP75	65.3	# 17	Compare
			AR1	54.7	# 11	Compare
			AR10	66.8	# 10	Compare
Video Object Segmentation	YouTube-VOS 2018	UniVS(Swin-L)	Mean Jaccard & F-Measure	71.5	# 13	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

UniVS: Unified and Universal Video Segmentation with Prompts as Queries

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove