Flow Guided Recurrent Neural Encoder for Video Salient Object Detection
Image saliency detection has recently witnessed significant progress due to deep convolutional neural networks. However, extending state-of-the-art saliency detectors from image to video is challenging. The performance of salient object detection suffers from object or camera motion and the dramatic change of the appearance contrast in videos. In this paper, we present flow guided recurrent neural encoder(FGRNE), an accurate and end-to-end learning framework for video salient object detection. It works by enhancing the temporal coherence of the per-frame feature by exploiting both motion information in terms of optical flow and sequential feature evolution encoding in terms of LSTM networks. It can be considered as a universal framework to extend any FCN based static saliency detector to video salient object detection. Intensive experimental results verify the effectiveness of each part of FGRNE and confirm that our proposed method significantly outperforms state-of-the-art methods on the public benchmarks of DAVIS and FBMS.
PDF AbstractResults from the Paper
Ranked #2 on Video Salient Object Detection on DAVSOD-Difficult20 (using extra training data)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Uses Extra Training Data |
Benchmark |
---|---|---|---|---|---|---|---|
Video Salient Object Detection | DAVIS-2016 | FGRN | S-Measure | 0.838 | # 6 | ||
MAX E-MEASURE | 0.917 | # 4 | |||||
MAX F-MEASURE | 0.783 | # 5 | |||||
AVERAGE MAE | 0.043 | # 5 | |||||
Video Salient Object Detection | DAVSOD-Difficult20 | FGRN | S-Measure | 0.608 | # 2 | ||
max E-measure | 0.698 | # 1 | |||||
Average MAE | 0.131 | # 3 | |||||
Video Salient Object Detection | DAVSOD-easy35 | FGRN | S-Measure | 0.701 | # 3 | ||
max F-Measure | 0.589 | # 3 | |||||
max E-Measure | 0.765 | # 2 | |||||
Average MAE | 0.095 | # 2 | |||||
Video Salient Object Detection | DAVSOD-Normal25 | FGRN | S-Measure | 0.638 | # 3 | ||
max E-measure | 0.700 | # 2 | |||||
Average MAE | 0.126 | # 2 | |||||
Video Salient Object Detection | FBMS-59 | FGRN | S-Measure | 0.809 | # 6 | ||
AVERAGE MAE | 0.088 | # 7 | |||||
MAX E-MEASURE | 0.863 | # 4 | |||||
MAX F-MEASURE | 0.767 | # 7 | |||||
Video Salient Object Detection | MCL | FGRN | S-Measure | 0.709 | # 4 | ||
MAX E-MEASURE | 0.817 | # 5 | |||||
MAX F-MEASURE | 0.625 | # 3 | |||||
AVERAGE MAE | 0.044 | # 6 | |||||
Video Salient Object Detection | UVSD | FGRN | S-Measure | 0.745 | # 3 | ||
max E-measure | 0.887 | # 3 | |||||
Average MAE | 0.042 | # 3 | |||||
Video Salient Object Detection | ViSal | FGRN | S-Measure | 0.861 | # 4 | ||
max E-measure | 0.945 | # 3 | |||||
Average MAE | 0.045 | # 4 | |||||
Video Salient Object Detection | VOS-T | FGRN | S-Measure | 0.715 | # 5 | ||
max E-measure | 0.797 | # 4 | |||||
Average MAE | 0.097 | # 4 |