Saliency Prediction
88 papers with code • 3 benchmarks • 7 datasets
A saliency map is a model that predicts eye fixations on a visual scene. Saliency prediction is informed by the human visual attention mechanism and predicts the possibility of the human eyes to stay in a certain position in the scene.
Libraries
Use these libraries to find Saliency Prediction models and implementationsMost implemented papers
Unified Image and Video Saliency Modeling
We evaluate our method on the video saliency datasets DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and MIT300.
Generative Transformer for Accurate and Reliable Salient Object Detection
For the former, we apply transformer to a deterministic model, and explain that the effective structure modeling and global context modeling abilities lead to its superior performance compared with the CNN based frameworks.
DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling
Since 2014 transfer learning has become the key driver for the improvement of spatial saliency prediction; however, with stagnant progress in the last 3-5 years.
Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet
Recent results suggest that state-of-the-art saliency models perform far from optimal in predicting fixations.
TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking
Traditional eye tracking requires specialized hardware, which means collecting gaze data from many observers is expensive, tedious and slow.
End-to-end Convolutional Network for Saliency Prediction
The prediction of saliency areas in images has been traditionally addressed with hand crafted features based on neuroscience principles.
Shallow and Deep Convolutional Networks for Saliency Prediction
The prediction of salient areas in images has been traditionally addressed with hand-crafted features based on neuroscience principles.
UberNet: Training a `Universal' Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory
In this work we introduce a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture that is trained end-to-end.
Deep Visual Attention Prediction
Our model is based on a skip-layer network structure, which predicts human attention from multiple convolutional layers with various reception fields.
Predicting Salient Face in Multiple-Face Videos
On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces.