DeepVS: A Deep Learning Based Video Saliency Prediction Approach

ECCV 2018  ·  Lai Jiang, Mai Xu, Tie Liu, Minglang Qiao, Zulin Wang ·

In this paper, we propose a novel deep learning based video saliency prediction method, named DeepVS. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which includes 32 subjects' fixations on 538 videos. We find from LEDOV that human attention is more likely to be attracted by objects, particularly the moving objects or the moving parts of objects. Hence, an object-to-motion convolutional neural network (OM-CNN) is developed to predict the intra-frame saliency for DeepVS, which is composed of the objectness and motion subnets. In OM-CNN, cross-net mask and hierarchical feature normalization are proposed to combine the spatial features of the objectness subnet and the temporal features of the motion subnet. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. We thus propose saliency-structured convolutional long short-term memory (SS-ConvLSTM) network, using the extracted features from OM-CNN as the input. Consequently, the inter-frame saliency maps of a video can be generated, which consider both structured output with center-bias and cross-frame transitions of human attention maps. Finally, the experimental results show that DeepVS advances the state-of-the-art in video saliency prediction.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Video Saliency Detection MSU Video Saliency Prediction DeepVS SIM 0.548 # 12
CC 0.586 # 11
NSS 1.44 # 11
AUC-J 0.804 # 14
KLDiv 0.707 # 12
FPS 3.29 # 9

Methods


No methods listed for this paper. Add relevant methods here