TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Instance Segmentation	YouTube-VIS validation	IFC (ResNet-50)	mask AP	42.8	# 32
Video Instance Segmentation	YouTube-VIS validation	IFC (ResNet-50)	AP50	65.8	# 30
Video Instance Segmentation	YouTube-VIS validation	IFC (ResNet-50)	AP75	46.8	# 30
Video Instance Segmentation	YouTube-VIS validation	IFC (ResNet-50)	AR1	43.8	# 23
Video Instance Segmentation	YouTube-VIS validation	IFC (ResNet-50)	AR10	51.2	# 26

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-instance-segmentation-using-inter-frame/video-instance-segmentation-on-youtube-vis-1)](https://paperswithcode.com/sota/video-instance-segmentation-on-youtube-vis-1?p=video-instance-segmentation-using-inter-frame)`

Video Instance Segmentation using Inter-Frame Communication Transformers

NeurIPS 2021 · Sukjun Hwang, Miran Heo, Seoung Wug Oh, Seon Joo Kim ·

We propose a novel end-to-end solution for video instance segmentation (VIS) based on transformers. Recently, the per-clip pipeline shows superior performance over per-frame methods leveraging richer information from multiple frames. However, previous per-clip models require heavy computation and memory usage to achieve frame-to-frame communications, limiting practicality. In this work, we propose Inter-frame Communication Transformers (IFC), which significantly reduces the overhead for information-passing between frames by efficiently encoding the context within the input clip. Specifically, we propose to utilize concise memory tokens as a mean of conveying information as well as summarizing each frame scene. The features of each frame are enriched and correlated with other frames through exchange of information between the precisely encoded memory tokens. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (AP 44.6 on YouTube-VIS 2019 val set using the offline inference) while having a considerably fast runtime (89.4 FPS). Our method can also be applied to near-online inference for processing a video in real-time with only a small delay. The code will be made available.

PDF Abstract NeurIPS 2021 PDF NeurIPS 2021 Abstract

Code

Add Remove Mark official

sukjunhwang/IFC official

Tasks

Add Remove

Instance Segmentation

Video Instance Segmentation

Datasets

MS COCO

YouTube-VIS 2019 YouTube-VIS 2021

Results from the Paper

Edit

Ranked #32 on Video Instance Segmentation on YouTube-VIS validation

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Instance Segmentation	YouTube-VIS validation	IFC (ResNet-50)	mask AP	42.8	# 32	Compare
			AP50	65.8	# 30	Compare
			AP75	46.8	# 30	Compare
			AR1	43.8	# 23	Compare
			AR10	51.2	# 26	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Video Instance Segmentation using Inter-Frame Communication Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove