Video Saliency Detection
19 papers with code • 5 benchmarks • 2 datasets
Latest papers
An Integrated System for Spatio-Temporal Summarization of 360-degrees Videos
In this work, we present an integrated system for spatiotemporal summarization of 360-degrees videos.
Panoramic Vision Transformer for Saliency Detection in 360° Videos
360$^\circ$ video saliency detection is one of the challenging benchmarks for 360$^\circ$ video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360$^\circ$ videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature.
A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Video saliency detection (VSD) aims at fast locating the most attractive objects/things/patterns in a given video clip.
GASP: Gated Attention For Saliency Prediction
We show that gaze direction and affective representations contribute a prediction to ground-truth correspondence improvement of at least 5% compared to dynamic saliency models without social cues.
Weakly Supervised Visual-Auditory Fixation Prediction with Multigranularity Perception
Moreover, we distill knowledge from these regions to obtain complete new spatial-temporal-audio (STA) fixation prediction (FP) networks, enabling broad applications in cases where video tags are not available.
From Semantic Categories to Fixations: A Novel Weakly-Supervised Visual-Auditory Saliency Detection Approach
Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly.
Weakly Supervised Video Salient Object Detection
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain.
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
We also explore a variation of ViNet architecture by augmenting audio features into the decoder.
Hierarchical Domain-Adapted Feature Learning for Video Saliency Prediction
When the base hierarchical model is empowered with domain-specific modules, performance improves, outperforming state-of-the-art models on three out of five metrics on the DHF1K benchmark and reaching the second-best results on the other two.
Exploring Rich and Efficient Spatial Temporal Interactions for Real Time Video Salient Object Detection
In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter.