A Generalized Framework for Video Instance Segmentation

The handling of long videos with complex and occluded sequences has recently emerged as a new challenge in the video instance segmentation (VIS) community. However, existing methods have limitations in addressing this challenge. We argue that the biggest bottleneck in current approaches is the discrepancy between training and inference. To effectively bridge this gap, we propose a Generalized framework for VIS, namely GenVIS, that achieves state-of-the-art performance on challenging benchmarks without designing complicated architectures or requiring extra post-processing. The key contribution of GenVIS is the learning strategy, which includes a query-based training pipeline for sequential learning with a novel target label assignment. Additionally, we introduce a memory that effectively acquires information from previous states. Thanks to the new perspective, which focuses on building relationships between separate frames or clips, GenVIS can be flexibly executed in both online and semi-online manner. We evaluate our approach on popular VIS benchmarks, achieving state-of-the-art results on YouTube-VIS 2019/2021/2022 and Occluded VIS (OVIS). Notably, we greatly outperform the state-of-the-art on the long VIS benchmark (OVIS), improving 5.6 AP with ResNet-50 backbone. Code is available at https://github.com/miranheo/GenVIS.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Results from the Paper


Ranked #6 on Video Instance Segmentation on YouTube-VIS 2021 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Video Instance Segmentation OVIS validation GenVIS (Swin-L) mask AP 45.4 # 10
AP50 69.2 # 8
AP75 47.8 # 8
AR1 18.9 # 8
AR10 49.0 # 10
Video Instance Segmentation YouTube-VIS 2021 GenVIS (Swin-L) mask AP 60.1 # 6
AP50 80.9 # 8
AP75 66.5 # 7
AR10 64.7 # 6
AR1 49.1 # 2
Video Instance Segmentation YouTube-VIS validation GenVIS (Swin-L) mask AP 64.0 # 11
AP50 84.9 # 10
AP75 68.3 # 10
AR1 56.1 # 6
AR10 69.4 # 5

Methods


No methods listed for this paper. Add relevant methods here