Video Object Segmentation
243 papers with code • 9 benchmarks • 17 datasets
Video object segmentation is a binary labeling problem aiming to separate foreground object(s) from the background region of a video.
For leaderboards please refer to the different subtasks.
Libraries
Use these libraries to find Video Object Segmentation models and implementationsDatasets
Subtasks
Latest papers with no code
Zero-Shot Open-Vocabulary Tracking with Large Pre-Trained Models
This begs the question: can we re-purpose these large-scale pre-trained static image models for open-vocabulary video tracking?
Sub-token ViT Embedding via Stochastic Resonance Transformers
We term our method ``Stochastic Resonance Transformer" (SRT), which we show can effectively super-resolve features of pre-trained ViTs, capturing more of the local fine-grained structures that might otherwise be neglected as a result of tokenization.
CoralVOS: Dataset and Benchmark for Coral Video Segmentation
We perform experiments on our CoralVOS dataset, including 6 recent state-of-the-art video object segmentation (VOS) algorithms.
Memory-Efficient Continual Learning Object Segmentation for Long Video
We propose two novel techniques to reduce the memory requirement of Online VOS methods while improving modeling accuracy and generalization on long videos.
Adversarial Attacks on Video Object Segmentation with Hard Region Discovery
Particularly, the gradients from the segmentation model are exploited to discover the easily confused region, in which it is difficult to identify the pixel-wise objects from the background in a frame.
Fully Transformer-Equipped Architecture for End-to-End Referring Video Object Segmentation
Referring Video Object Segmentation (RVOS) requires segmenting the object in video referred by a natural language query.
Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation
Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge.
Temporal Collection and Distribution for Referring Video Object Segmentation
Furthermore, to explicitly capture object motions and spatial-temporal cross-modal reasoning over objects, we propose a novel temporal collection-distribution mechanism for interacting between the global referent token and object queries.
Robust Visual Tracking by Motion Analyzing
In this paper, we propose a new algorithm that addresses this limitation by analyzing the motion pattern using the inherent tensor structure.
Joint Modeling of Feature, Correspondence, and a Compressed Memory for Video Object Segmentation
To overcome these issues, we propose a unified VOS framework, coined as JointFormer, for joint modeling the three elements of feature, correspondence, and a compressed memory.