TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Instance Segmentation	OVIS validation	GenVIS (Swin-L)	mask AP	45.4	# 10
Video Instance Segmentation	OVIS validation	GenVIS (Swin-L)	AP50	69.2	# 8
Video Instance Segmentation	OVIS validation	GenVIS (Swin-L)	AP75	47.8	# 8
Video Instance Segmentation	OVIS validation	GenVIS (Swin-L)	AR1	18.9	# 8
Video Instance Segmentation	OVIS validation	GenVIS (Swin-L)	AR10	49.0	# 10
Video Instance Segmentation	YouTube-VIS 2021	GenVIS (Swin-L)	mask AP	60.1	# 6
Video Instance Segmentation	YouTube-VIS 2021	GenVIS (Swin-L)	AP50	80.9	# 8
Video Instance Segmentation	YouTube-VIS 2021	GenVIS (Swin-L)	AP75	66.5	# 7
Video Instance Segmentation	YouTube-VIS 2021	GenVIS (Swin-L)	AR10	64.7	# 6
Video Instance Segmentation	YouTube-VIS 2021	GenVIS (Swin-L)	AR1	49.1	# 2
Video Instance Segmentation	YouTube-VIS validation	GenVIS (Swin-L)	mask AP	64.0	# 11
Video Instance Segmentation	YouTube-VIS validation	GenVIS (Swin-L)	AP50	84.9	# 10
Video Instance Segmentation	YouTube-VIS validation	GenVIS (Swin-L)	AP75	68.3	# 10
Video Instance Segmentation	YouTube-VIS validation	GenVIS (Swin-L)	AR1	56.1	# 6
Video Instance Segmentation	YouTube-VIS validation	GenVIS (Swin-L)	AR10	69.4	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-generalized-framework-for-video-instance/video-instance-segmentation-on-youtube-vis-2)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-2?p=a-generalized-framework-for-video-instance)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-generalized-framework-for-video-instance/video-instance-segmentation-on-ovis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-ovis-1?p=a-generalized-framework-for-video-instance)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-generalized-framework-for-video-instance/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=a-generalized-framework-for-video-instance)`

A Generalized Framework for Video Instance Segmentation

CVPR 2023 · Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim ·

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

miranheo/genvis official

117

Tasks

Add Remove

Instance Segmentation

Semantic Segmentation

Video Instance Segmentation

Datasets

MS COCO

YouTube-VIS 2019

OVIS YouTube-VIS 2021

Results from the Paper

Edit

Ranked #6 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Instance Segmentation	OVIS validation	GenVIS (Swin-L)	mask AP	45.4	# 10	Compare
			AP50	69.2	# 8	Compare
			AP75	47.8	# 8	Compare
			AR1	18.9	# 8	Compare
			AR10	49.0	# 10	Compare
Video Instance Segmentation	YouTube-VIS 2021	GenVIS (Swin-L)	mask AP	60.1	# 6	Compare
			AP50	80.9	# 8	Compare
			AP75	66.5	# 7	Compare
			AR10	64.7	# 6	Compare
			AR1	49.1	# 2	Compare
Video Instance Segmentation	YouTube-VIS validation	GenVIS (Swin-L)	mask AP	64.0	# 11	Compare
			AP50	84.9	# 10	Compare
			AP75	68.3	# 10	Compare
			AR1	56.1	# 6	Compare
			AR10	69.4	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

A Generalized Framework for Video Instance Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove