Video Object Segmentation
243 papers with code • 9 benchmarks • 17 datasets
Video object segmentation is a binary labeling problem aiming to separate foreground object(s) from the background region of a video.
For leaderboards please refer to the different subtasks.
Libraries
Use these libraries to find Video Object Segmentation models and implementationsDatasets
Subtasks
Latest papers
Flexible visual prompts for in-context learning in computer vision
Additionally, we propose a technique for support set selection, which involves choosing the most relevant images to include in this set.
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
In this paper, we propose a simple yet effective approach for self-supervised video object segmentation (VOS).
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
In-context segmentation aims at segmenting novel images using a few labeled example images, termed as "in-context examples", exploring content similarities between examples and the target.
Putting the Object Back into Video Object Segmentation
We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result.
Treating Motion as Option with Output Selection for Unsupervised Video Object Segmentation
Unsupervised video object segmentation (VOS) is a task that aims to detect the most salient object in a video without external guidance about the object.
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
Our dataset poses new challenges in panoramic VOS and we hope that our PanoVOS can advance the development of panoramic segmentation/tracking.
Multi-grained Temporal Prototype Learning for Few-shot Video Object Segmentation
We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively.
Tracking Anything with Decoupled Video Segmentation
To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation.
Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples
Referring video object segmentation (RVOS), as a supervised learning task, relies on sufficient annotated data for a given scene.
Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation
Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS).