Video Object Segmentation
240 papers with code • 9 benchmarks • 17 datasets
Video object segmentation is a binary labeling problem aiming to separate foreground object(s) from the background region of a video.
For leaderboards please refer to the different subtasks.
Libraries
Use these libraries to find Video Object Segmentation models and implementationsDatasets
Subtasks
Latest papers
Towards Temporally Consistent Referring Video Object Segmentation
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects.
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.
PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model
PSALM is a powerful extension of the Large Multi-modal Model (LMM) to address the segmentation task challenges.
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
We hypothesize that the latent representation learned from a pretrained generative T2V model encapsulates rich semantics and coherent temporal correspondences, thereby naturally facilitating video understanding.
Video Object Segmentation with Dynamic Query Modulation
Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS).
VideoMAC: Video Masked Autoencoders Meet ConvNets
In this paper, we propose a new approach termed as \textbf{VideoMAC}, which combines video masked autoencoders with resource-friendly ConvNets.
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.
Lester: rotoscope animation through video object segmentation and tracking
This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos.
Vivim: a Video Vision Mamba for Medical Video Object Segmentation
Traditional convolutional neural networks have a limited receptive field while transformer-based networks are mediocre in constructing long-term dependency from the perspective of computational complexity.
OMG-Seg: Is One Model Good Enough For All Segmentation?
In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.