Video Instance Segmentation

85 papers with code • 8 benchmarks • 8 datasets

The goal of video instance segmentation is simultaneous detection, segmentation and tracking of instances in videos. In words, it is the first time that the image instance segmentation problem is extended to the video domain.

To facilitate research on this new task, a large-scale benchmark called YouTube-VIS, which consists of 2,883 high-resolution YouTube videos, a 40-category label set and 131k high-quality instance masks is built.

Libraries

Use these libraries to find Video Instance Segmentation models and implementations
3 papers
400
2 papers
27,866
2 papers
593
See all 7 libraries.

Most implemented papers

SeqFormer: Sequential Transformer for Video Instance Segmentation

wjf5203/SeqFormer 15 Dec 2021

Nevertheless, we observe that a stand-alone instance query suffices for capturing a time sequence of instances in a video, but attention mechanisms shall be done with each frame independently.

RankSeg: Adaptive Pixel Classification with Image Category Ranking for Segmentation

openseg-group/rankseg 8 Mar 2022

Given an input image or video, our framework first conducts multi-label classification over the complete label, then sorts the complete label and selects a small subset according to their class confidence scores.

MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training

nvlabs/minvis 3 Aug 2022

By only training a query-based image instance segmentation model, MinVIS outperforms the previous best result on the challenging Occluded VIS dataset by over 10% AP.

Tube-Link: A Flexible Cross Tube Framework for Universal Video Segmentation

lxtgh/tube-link ICCV 2023

Our framework is a near-online approach that takes a short subclip as input and outputs the corresponding spatial-temporal tube masks.

Spatio-temporal Prompting Network for Robust Video Feature Extraction

guanxiongsun/vfe.pytorch ICCV 2023

Then, these video prompts are prepended to the patch embeddings of the current frame as the updated input for video feature extraction.

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo2 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

Efficient Video Object Segmentation via Network Modulation

linjieyangsc/video_seg CVPR 2018

Video object segmentation targets at segmenting a specific object throughout a video sequence, given only an annotated first frame.

Instance-wise Depth and Motion Learning from Monocular Videos

SeokjuLee/Insta-DM 19 Dec 2019

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.

Learning a Spatio-Temporal Embedding for Video Instance Segmentation

jdc08161063/spatio-temporal-embedding 19 Dec 2019

We present a novel embedding approach for video instance segmentation.

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

sabarim/STEm-Seg ECCV 2020

In this paper, we propose a different approach that is well-suited to a variety of tasks involving instance segmentation in videos.