RANet: Ranking Attention Network for Fast Video Object Segmentation

ICCV 2019  ·  Ziqin Wang, Jun Xu, Li Liu, Fan Zhu, Ling Shao ·

Despite online learning (OL) techniques have boosted the performance of semi-supervised video object segmentation (VOS) methods, the huge time costs of OL greatly restrict their practicality. Matching based and propagation based methods run at a faster speed by avoiding OL techniques. However, they are limited by sub-optimal accuracy, due to mismatching and drifting problems. In this paper, we develop a real-time yet very accurate Ranking Attention Network (RANet) for VOS. Specifically, to integrate the insights of matching based and propagation based methods, we employ an encoder-decoder framework to learn pixel-level similarity and segmentation in an end-to-end manner. To better utilize the similarity maps, we propose a novel ranking attention module, which automatically ranks and selects these maps for fine-grained VOS performance. Experiments on DAVIS-16 and DAVIS-17 datasets show that our RANet achieves the best speed-accuracy trade-off, e.g., with 33 milliseconds per frame and J&F=85.5% on DAVIS-16. With OL, our RANet reaches J&F=87.1% on DAVIS-16, exceeding state-of-the-art VOS methods. The code can be found at https://github.com/Storife/RANet.

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semi-Supervised Video Object Segmentation DAVIS 2016 RANet Jaccard (Mean) 85.5 # 51
Jaccard (Recall) 97.2 # 5
Jaccard (Decay) 6.2 # 23
F-measure (Mean) 85.4 # 50
F-measure (Recall) 94.9 # 8
F-measure (Decay) 5.1 # 26
J&F 85.45 # 51
Semi-Supervised Video Object Segmentation DAVIS 2016 RANet+ (online learning) Jaccard (Mean) 86.6 # 43
Jaccard (Recall) 97 # 8
Jaccard (Decay) 7.4 # 19
F-measure (Mean) 87.6 # 44
F-measure (Recall) 96.1 # 3
F-measure (Decay) 8.2 # 20
J&F 87.1 # 43
Semi-Supervised Video Object Segmentation DAVIS 2017 (test-dev) RANet J&F 55.4 # 48
Jaccard (Mean) 53.4 # 47
Jaccard (Recall) 61.9 # 10
Jaccard (Decay) 21.9 # 13
F-measure (Mean) 57.3 # 49
F-measure (Recall) 67.7 # 11
F-measure (Decay) 22.1 # 14
Semi-Supervised Video Object Segmentation DAVIS 2017 (val) RANet Jaccard (Mean) 63.2 # 64
Jaccard (Recall) 73.7 # 16
Jaccard (Decay) 18.6 # 15
F-measure (Mean) 68.2 # 65
F-measure (Recall) 78.8 # 14
F-measure (Decay) 19.7 # 13
J&F 65.7 # 65
Semi-Supervised Video Object Segmentation DAVIS (no YouTube-VOS training) RANet FPS 30.3 # 6
D16 val (G) 85.5 # 10
D16 val (J) 85.5 # 7
D16 val (F) 85.4 # 10
D17 val (G) 65.7 # 24
D17 val (J) 63.2 # 24
D17 val (F) 68.2 # 24
D17 test (G) 55.3 # 6
D17 test (J) 53.4 # 6
D17 test (F) 57.2 # 7

Methods