TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Instance Segmentation	OVIS validation	IDOL (ResNet-50)	mask AP	30.2	# 28
Video Instance Segmentation	OVIS validation	IDOL (ResNet-50)	AP50	51.3	# 28
Video Instance Segmentation	OVIS validation	IDOL (ResNet-50)	AP75	30	# 28
Video Instance Segmentation	OVIS validation	IDOL (ResNet-50)	AR1	15	# 22
Video Instance Segmentation	OVIS validation	IDOL (ResNet-50)	AR10	37.5	# 20
Video Instance Segmentation	OVIS validation	IDOL (Swin-L)	mask AP	42.6	# 13
Video Instance Segmentation	OVIS validation	IDOL (Swin-L)	AP50	65.7	# 15
Video Instance Segmentation	OVIS validation	IDOL (Swin-L)	AP75	45.2	# 11
Video Instance Segmentation	OVIS validation	IDOL (Swin-L)	AR1	17.9	# 13
Video Instance Segmentation	OVIS validation	IDOL (Swin-L)	AR10	49.6	# 7
Video Instance Segmentation	YouTube-VIS 2021	IDOL (Swin-L)	mask AP	56.1	# 12
Video Instance Segmentation	YouTube-VIS 2021	IDOL (Swin-L)	AP50	80.8	# 9
Video Instance Segmentation	YouTube-VIS 2021	IDOL (Swin-L)	AP75	63.5	# 10
Video Instance Segmentation	YouTube-VIS 2021	IDOL (Swin-L)	AR10	60.1	# 15
Video Instance Segmentation	YouTube-VIS 2021	IDOL (Swin-L)	AR1	45	# 14
Video Instance Segmentation	YouTube-VIS validation	IDOL (Swin-L)	mask AP	64.3	# 9
Video Instance Segmentation	YouTube-VIS validation	IDOL (Swin-L)	AP50	87.5	# 6
Video Instance Segmentation	YouTube-VIS validation	IDOL (Swin-L)	AP75	71.0	# 8
Video Instance Segmentation	YouTube-VIS validation	IDOL (Swin-L)	AR1	55.6	# 9
Video Instance Segmentation	YouTube-VIS validation	IDOL (Swin-L)	AR10	69.1	# 6
Video Instance Segmentation	YouTube-VIS validation	IDOL (ResNet-50)	mask AP	49.5	# 23
Video Instance Segmentation	YouTube-VIS validation	IDOL (ResNet-50)	AP50	74	# 21
Video Instance Segmentation	YouTube-VIS validation	IDOL (ResNet-50)	AP75	52.9	# 24
Video Instance Segmentation	YouTube-VIS validation	IDOL (ResNet-50)	AR1	47.7	# 18
Video Instance Segmentation	YouTube-VIS validation	IDOL (ResNet-50)	AR10	58.7	# 18

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=in-defense-of-online-models-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-youtube-vis-2)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=in-defense-of-online-models-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/in-defense-of-online-models-for-video/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=in-defense-of-online-models-for-video)`

In Defense of Online Models for Video Instance Segmentation

21 Jul 2022 · Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai ·

In recent years, video instance segmentation (VIS) has been largely advanced by offline models, while online models gradually attracted less attention possibly due to their inferior performance. However, online methods have their inherent advantage in handling long video sequences and ongoing videos while offline models fail due to the limit of computational resources. Therefore, it would be highly desirable if online models can achieve comparable or even better performance than offline models. By dissecting current online models and offline models, we demonstrate that the main cause of the performance gap is the error-prone association between frames caused by the similar appearance among different instances in the feature space. Observing this, we propose an online framework based on contrastive learning that is able to learn more discriminative instance embeddings for association and fully exploit history information for stability. Despite its simplicity, our method outperforms all online and offline methods on three benchmarks. Specifically, we achieve 49.5 AP on YouTube-VIS 2019, a significant improvement of 13.2 AP and 2.1 AP over the prior online and offline art, respectively. Moreover, we achieve 30.2 AP on OVIS, a more challenging dataset with significant crowding and occlusions, surpassing the prior art by 14.8 AP. The proposed method won first place in the video instance segmentation track of the 4th Large-scale Video Object Segmentation Challenge (CVPR2022). We hope the simplicity and effectiveness of our method, as well as our insight into current methods, could shed light on the exploration of VIS models.

PDF Abstract

Code

Add Remove Mark official

wjf5203/vnext official

592

Tasks

Add Remove

Contrastive Learning

Instance Segmentation

Segmentation

Semantic Segmentation

Video Instance Segmentation

Video Object Segmentation

Video Semantic Segmentation

Datasets

YouTube-VIS 2019

OVIS YouTube-VIS 2021

Results from the Paper

Edit

Ranked #9 on Video Instance Segmentation on YouTube-VIS validation (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Instance Segmentation	OVIS validation	IDOL (ResNet-50)	mask AP	30.2	# 28	Compare
			AP50	51.3	# 28	Compare
			AP75	30	# 28	Compare
			AR1	15	# 22	Compare
			AR10	37.5	# 20	Compare
Video Instance Segmentation	OVIS validation	IDOL (Swin-L)	mask AP	42.6	# 13	Compare
			AP50	65.7	# 15	Compare
			AP75	45.2	# 11	Compare
			AR1	17.9	# 13	Compare
			AR10	49.6	# 7	Compare
Video Instance Segmentation	YouTube-VIS 2021	IDOL (Swin-L)	mask AP	56.1	# 12	Compare
			AP50	80.8	# 9	Compare
			AP75	63.5	# 10	Compare
			AR10	60.1	# 15	Compare
			AR1	45	# 14	Compare
Video Instance Segmentation	YouTube-VIS validation	IDOL (Swin-L)	mask AP	64.3	# 9	Compare
			AP50	87.5	# 6	Compare
			AP75	71.0	# 8	Compare
			AR1	55.6	# 9	Compare
			AR10	69.1	# 6	Compare
Video Instance Segmentation	YouTube-VIS validation	IDOL (ResNet-50)	mask AP	49.5	# 23	Compare
			AP50	74	# 21	Compare
			AP75	52.9	# 24	Compare
			AR1	47.7	# 18	Compare
			AR10	58.7	# 18	Compare

Methods

Add Remove

Contrastive Learning

Edit Social Preview

In Defense of Online Models for Video Instance Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove