TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Instance Segmentation	OVIS validation	CTVIS (Swin-L)	mask AP	46.9	# 7
Video Instance Segmentation	OVIS validation	CTVIS (Swin-L)	AP50	71.5	# 6
Video Instance Segmentation	OVIS validation	CTVIS (Swin-L)	AP75	47.5	# 10
Video Instance Segmentation	OVIS validation	CTVIS (Swin-L)	APmo	52.1	# 3
Video Instance Segmentation	OVIS validation	CTVIS (Swin-L)	APho	19.1	# 6
Video Instance Segmentation	OVIS validation	CTVIS (ResNet-50)	mask AP	35.5	# 22
Video Instance Segmentation	OVIS validation	CTVIS (ResNet-50)	AP50	60.8	# 19
Video Instance Segmentation	OVIS validation	CTVIS (ResNet-50)	AP75	34.9	# 23
Video Instance Segmentation	OVIS validation	CTVIS (ResNet-50)	APmo	41.9	# 6
Video Instance Segmentation	OVIS validation	CTVIS (ResNet-50)	APho	16.1	# 7
Video Instance Segmentation	Youtube-VIS 2022 Validation	CTVIS (Swin-L)	mAP_L	46.4	# 2
Video Instance Segmentation	Youtube-VIS 2022 Validation	CTVIS (ResNet-50)	mAP_L	39.4	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ctvis-consistent-training-for-online-video/video-instance-segmentation-on-youtube-vis-3)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-3?p=ctvis-consistent-training-for-online-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ctvis-consistent-training-for-online-video/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=ctvis-consistent-training-for-online-video)`

CTVIS: Consistent Training for Online Video Instance Segmentation

ICCV 2023 · Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen ·

The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS). Instance embedding learning is directly supervised by the contrastive loss computed upon the contrastive items (CIs), which are sets of anchor/positive/negative embeddings. Recent online VIS methods leverage CIs sourced from one reference frame only, which we argue is insufficient for learning highly discriminative embeddings. Intuitively, a possible strategy to enhance CIs is replicating the inference phase during training. To this end, we propose a simple yet effective training strategy, called Consistent Training for Online VIS (CTVIS), which devotes to aligning the training and inference pipelines in terms of building CIs. Specifically, CTVIS constructs CIs by referring inference the momentum-averaged embedding and the memory bank storage mechanisms, and adding noise to the relevant embeddings. Such an extension allows a reliable comparison between embeddings of current instances and the stable representations of historical instances, thereby conferring an advantage in modeling VIS challenges such as occlusion, re-identification, and deformation. Empirically, CTVIS outstrips the SOTA VIS models by up to +5.0 points on three VIS benchmarks, including YTVIS19 (55.1% AP), YTVIS21 (50.1% AP) and OVIS (35.5% AP). Furthermore, we find that pseudo-videos transformed from images can train robust models surpassing fully-supervised ones.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

kainingying/ctvis official

Tasks

Add Remove

Instance Segmentation

Semantic Segmentation

Video Instance Segmentation

Datasets

OVIS

Youtube-VIS 2022 Validation

Results from the Paper

Edit

Ranked #2 on Video Instance Segmentation on Youtube-VIS 2022 Validation (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Instance Segmentation	OVIS validation	CTVIS (Swin-L)	mask AP	46.9	# 7	Compare
			AP50	71.5	# 6	Compare
			AP75	47.5	# 10	Compare
			APmo	52.1	# 3	Compare
			APho	19.1	# 6	Compare
Video Instance Segmentation	OVIS validation	CTVIS (ResNet-50)	mask AP	35.5	# 22	Compare
			AP50	60.8	# 19	Compare
			AP75	34.9	# 23	Compare
			APmo	41.9	# 6	Compare
			APho	16.1	# 7	Compare
Video Instance Segmentation	Youtube-VIS 2022 Validation	CTVIS (Swin-L)	mAP_L	46.4	# 2	Compare
Video Instance Segmentation	Youtube-VIS 2022 Validation	CTVIS (ResNet-50)	mAP_L	39.4	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

CTVIS: Consistent Training for Online Video Instance Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove