Video Object Segmentation
243 papers with code • 9 benchmarks • 17 datasets
Video object segmentation is a binary labeling problem aiming to separate foreground object(s) from the background region of a video.
For leaderboards please refer to the different subtasks.
Libraries
Use these libraries to find Video Object Segmentation models and implementationsDatasets
Subtasks
Latest papers
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Despite the recent advances in unified image segmentation (IS), developing a unified video segmentation (VS) model remains a challenge.
Lester: rotoscope animation through video object segmentation and tracking
This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos.
Vivim: a Video Vision Mamba for Medical Video Object Segmentation
Traditional convolutional neural networks have a limited receptive field while transformer-based networks are mediocre in constructing long-term dependency from the perspective of computational complexity.
OMG-Seg: Is One Model Good Enough For All Segmentation?
In this work, we address various segmentation tasks, each traditionally tackled by distinct or partially unified models.
1st Place Solution for 5th LSVOS Challenge: Referring Video Object Segmentation
The recent transformer-based models have dominated the Referring Video Object Segmentation (RVOS) task due to the superior performance.
Tracking with Human-Intent Reasoning
The perception component then generates the tracking results based on the embeddings.
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
We evaluate our unified models on various benchmarks.
Hierarchical Graph Pattern Understanding for Zero-Shot VOS
However, existing optical flow-based methods have a significant dependency on optical flow, which results in poor performance when the optical flow estimation fails for a particular scene.
General Object Foundation Model for Images and Videos at Scale
We present GLEE in this work, an object-level foundation model for locating and identifying objects in images and videos.
Semi-supervised Active Learning for Video Action Detection
First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21.