Saliency Prediction
88 papers with code • 3 benchmarks • 7 datasets
A saliency map is a model that predicts eye fixations on a visual scene. Saliency prediction is informed by the human visual attention mechanism and predicts the possibility of the human eyes to stay in a certain position in the scene.
Libraries
Use these libraries to find Saliency Prediction models and implementationsLatest papers with no code
CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Incorporating the audio stream enables Video Saliency Prediction (VSP) to imitate the selective attention mechanism of human brain.
Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction
We find in practice simply adding the subtask might confuse the main task learning, hence Multi-task Attention Module is proposed to deal with the feature interaction between the multiple learning targets.
CASP-Net: Rethinking Video Saliency Prediction From an Audio-Visual Consistency Perceptual Perspective
Incorporating the audio stream enables Video Saliency Prediction (VSP) to imitate the selective attention mechanism of human brain.
Co-Salient Object Detection With Uncertainty-Aware Group Exchange-Masking
To address this issue, this paper presents a group exchange-masking (GEM) strategy for robust CoSOD model learning.
FBLNet: FeedBack Loop Network for Driver Attention Prediction
The driving experience is extremely important for safe driving, a skilled driver is able to effortlessly predict oncoming danger (before it becomes salient) based on the driving experience and quickly pay attention to the corresponding zones. However, the nonobjective driving experience is difficult to model, so a mechanism simulating the driver experience accumulation procedure is absent in existing methods, and the current methods usually follow the technique line of saliency prediction methods to predict driver attention.
Context-empowered Visual Attention Prediction in Pedestrian Scenarios
In this paper, we present Context-SalNET, a novel encoder-decoder architecture that explicitly addresses three key challenges of visual attention prediction in pedestrians: First, Context-SalNET explicitly models the context factors urgency and safety preference in the latent space of the encoder-decoder model.
Saliency-based Multiple Region of Interest Detection from a Single 360° image
360{\deg} images are informative -- it contains omnidirectional visual information around the camera.
Dual Domain-Adversarial Learning for Audio-Visual Saliency Prediction
The domain discrepancy induces to performance degradation on target testing data for CNN models.
Rethinking gradient weights' influence over saliency map estimation
Typical gradient-oriented CAM studies rely on weighted aggregation for saliency map estimation by projecting the gradient maps into single weight values, which may lead to over generalized saliency map.
SiaTrans: Siamese Transformer Network for RGB-D Salient Object Detection with Depth Image Classification
Transformer-based cross-modality fusion module (CMF) can effectively fuse RGB and depth information.