Semi-Supervised Video Object Segmentation
94 papers with code • 15 benchmarks • 13 datasets
The semi-supervised scenario assumes the user inputs a full mask of the object(s) of interest in the first frame of a video sequence. Methods have to produce the segmentation mask for that object(s) in the subsequent frames.
Libraries
Use these libraries to find Semi-Supervised Video Object Segmentation models and implementationsDatasets
Latest papers
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.
Video Object Segmentation with Dynamic Query Modulation
Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS).
Lester: rotoscope animation through video object segmentation and tracking
This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos.
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
To alleviate the above problem, we propose a simple, flexible and effective video-level tracking pipeline, named \textbf{ODTrack}, which densely associates the contextual relationships of video frames in an online token propagation manner.
Putting the Object Back into Video Object Segmentation
The object queries act as a high-level summary of the target object, while high-resolution feature maps are retained for accurate segmentation.
Tracking Anything with Decoupled Video Segmentation
To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation.
XMem++: Production-level Video Segmentation From Few Annotated Frames
Despite advancements in user-guided video segmentation, extracting complex objects consistently for highly complex scenes is still a labor-intensive task, especially for production.
Tracking Anything in High Quality
To further improve the quality of tracking masks, a pretrained MR model is employed to refine the tracking results.
READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation
We present READMem (Robust Embedding Association for a Diverse Memory), a modular framework for semi-automatic video object segmentation (sVOS) methods designed to handle unconstrained videos.
Video Object Segmentation in Panoptic Wild Scenes
Considering the challenges in panoptic VOS, we propose a strong baseline method named panoptic object association with transformers (PAOT), which uses panoptic identification to associate objects with a pyramid architecture on multiple scales.