Video Object Segmentation using Space-Time Memory Networks

We propose a novel solution for semi-supervised video object segmentation. By the nature of the problem, available cues (e.g. video frame(s) with object masks) become richer with the intermediate predictions. However, the existing methods are unable to fully exploit this rich source of information. We resolve the issue by leveraging memory networks and learn to read relevant information from all available sources. In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory. Specifically, the query and the memory are densely matched in the feature space, covering all the space-time pixel locations in a feed-forward fashion. Contrast to the previous approaches, the abundant use of the guidance information allows us to better handle the challenges such as appearance changes and occlussions. We validate our method on the latest benchmark sets and achieved the state-of-the-art performance (overall score of 79.4 on Youtube-VOS val set, J of 88.7 and 79.2 on DAVIS 2016/2017 val set respectively) while having a fast runtime (0.16 second/frame on DAVIS 2016 val set).

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Semi-Supervised Video Object Segmentation DAVIS 2016 STM Jaccard (Mean) 88.7 # 31
Jaccard (Recall) 97.4 # 3
Jaccard (Decay) 5.0 # 28
F-measure (Mean) 90.1 # 36
F-measure (Recall) 95.2 # 7
F-measure (Decay) 4.2 # 31
J&F 89.4 # 33
Interactive Video Object Segmentation DAVIS 2017 STM AUC-J&F 0.803 # 4
J&F@60s 0.848 # 3
Semi-Supervised Video Object Segmentation DAVIS 2017 (val) STM Jaccard (Mean) 79.2 # 36
Jaccard (Recall) 88.7 # 4
Jaccard (Decay) 8.0 # 3
F-measure (Mean) 84.3 # 39
F-measure (Recall) 91.8 # 3
F-measure (Decay) 10.5 # 2
J&F 81.75 # 39
Semi-Supervised Video Object Segmentation DAVIS (no YouTube-VOS training) STM FPS 6.25 # 17
D16 val (G) 86.5 # 6
D16 val (J) 84.8 # 10
D16 val (F) 88.1 # 3
D17 val (G) 71.6 # 16
D17 val (J) 69.2 # 17
D17 val (F) 74.0 # 16
Semi-Supervised Video Object Segmentation YouTube-VOS 2018 STM Overall 68.2 # 45

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Semi-Supervised Video Object Segmentation DAVIS 2017 (test-dev) STM J&F 72.2 # 37
Jaccard (Mean) 69.3 # 37
Jaccard (Recall) 78.0 # 3
Jaccard (Decay) 16.9 # 3
F-measure (Mean) 75.2 # 39
F-measure (Recall) 83.0 # 4
F-measure (Decay) 17.5 # 5

Methods


No methods listed for this paper. Add relevant methods here