Video Instance Segmentation

85 papers with code • 8 benchmarks • 8 datasets

The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain.

To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Instance Segmentation

Dataset	Best Model	Compare
YouTube-VIS validation	DVIS++(VIT-L, Offline)	See all
OVIS validation	DVIS++(VIT-L, Offline)	See all
YouTube-VIS 2021	DVIS++(VIT-L, Offline)	See all
Youtube-VIS 2022 Validation	DVIS++(VIT-L)	See all
BDD100K val	PCAN	See all
HQ-YTVIS	VMT (Swin-L)	See all
YouTube-VIS	STC	See all
Youtube-VIS (trained with no video masks)	MaskFreeVIS	See all

Libraries

Use these libraries to find Video Instance Segmentation models and implementations

hustvl/QueryInst

3 papers

400

open-mmlab/mmdetection

2 papers

27,744

open-mmlab/mmtracking

2 papers

3,375

wjf5203/vnext

2 papers

592

See all 7 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Simple Online and Realtime Tracking with a Deep Association Metric

nwojke/deep_sort • • 21 Mar 2017

Simple Online and Realtime Tracking (SORT) is a pragmatic approach to multiple object tracking with a focus on simple, effective algorithms.

Paper
Code

Video Instance Segmentation

Epiphqny/VisTR • • ICCV 2019

The goal of this new task is simultaneous detection, segmentation and tracking of instances in videos.

Paper
Code

Instances as Queries

hustvl/QueryInst • • ICCV 2021

The key insight of QueryInst is to leverage the intrinsic one-to-one correspondence in object queries across different stages, as well as one-to-one correspondence between mask RoI features and object queries in the same stage.

Paper
Code

Mask2Former for Video Instance Segmentation

facebookresearch/Mask2Former • • 20 Dec 2021

We find Mask2Former also achieves state-of-the-art performance on video instance segmentation without modifying the architecture, the loss or even the training pipeline.

Paper
Code

Temporally Efficient Vision Transformer for Video Instance Segmentation

hustvl/tevit • • CVPR 2022

To effectively and efficiently model the crucial temporal information within a video clip, we propose a Temporally Efficient Vision Transformer (TeViT) for video instance segmentation (VIS).

Paper
Code

End-to-End Video Instance Segmentation with Transformers

Epiphqny/VisTR • • CVPR 2021

Here, we propose a new video instance segmentation framework built upon Transformers, termed VisTR, which views the VIS task as a direct end-to-end parallel sequence decoding/prediction problem.

Paper
Code

Occluded Video Instance Segmentation: A Benchmark

qjy981010/CMaskTrack-RCNN • • 2 Feb 2021

On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 16. 3, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario.

Paper
Code