BATMAN: Bilateral Attention Transformer in Motion-Appearance Neighboring Space for Video Object Segmentation

1 Aug 2022  ·  Ye Yu, Jialin Yuan, Gaurav Mittal, Li Fuxin, Mei Chen ·

Video Object Segmentation (VOS) is fundamental to video understanding. Transformer-based methods show significant performance improvement on semi-supervised VOS. However, existing work faces challenges segmenting visually similar objects in close proximity of each other. In this paper, we propose a novel Bilateral Attention Transformer in Motion-Appearance Neighboring space (BATMAN) for semi-supervised VOS. It captures object motion in the video via a novel optical flow calibration module that fuses the segmentation mask with optical flow estimation to improve within-object optical flow smoothness and reduce noise at object boundaries. This calibrated optical flow is then employed in our novel bilateral attention, which computes the correspondence between the query and reference frames in the neighboring bilateral space considering both motion and appearance. Extensive experiments validate the effectiveness of BATMAN architecture by outperforming all existing state-of-the-art on all four popular VOS benchmarks: Youtube-VOS 2019 (85.0%), Youtube-VOS 2018 (85.3%), DAVIS 2017Val/Testdev (86.2%/82.2%), and DAVIS 2016 (92.5%).

PDF Abstract

Results from the Paper


 Ranked #1 on Video Object Segmentation on DAVIS 2017 (test-dev) (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Video Object Segmentation DAVIS 2016 RMN (val) Jaccard (Mean) 88.9 # 12
F-Score 88.7 # 17
J&F 88.8 # 15
Video Object Segmentation DAVIS 2016 AOT (val) Jaccard (Mean) 90.1 # 7
F-Score 92.1 # 9
J&F 91.1 # 7
Video Object Segmentation DAVIS 2016 LCM (val) Jaccard (Mean) 89.9 # 8
F-Score 91.4 # 11
J&F 90.7 # 8
Video Object Segmentation DAVIS 2016 STM (val) Jaccard (Mean) 88.7 # 13
F-Score 89.9 # 16
Video Object Segmentation DAVIS 2016 BATMAN (val) Jaccard (Mean) 90.7 # 4
F-Score 94.2 # 3
J&F 92.5 # 3
Video Object Segmentation DAVIS 2016 STCN (val) Jaccard (Mean) 90.8 # 3
F-Score 92.5 # 8
J&F 91.6 # 4
Video Object Segmentation DAVIS 2016 CFBI (val) Jaccard (Mean) 88.3 # 16
F-Score 90.5 # 14
J&F 89.4 # 14
Video Object Segmentation DAVIS 2016 KMN (val) Jaccard (Mean) 89.5 # 10
F-Score 91.5 # 10
J&F 90.5 # 10
Video Object Segmentation DAVIS 2016 CFBI+ (val) Jaccard (Mean) 88.7 # 13
F-Score 91.1 # 13
J&F 89.9 # 12
Video Object Segmentation DAVIS 2016 RPCMVOS (val) Jaccard (Mean) 87.1 # 17
F-Score 94 # 5
J&F 90.6 # 9
Video Object Segmentation DAVIS 2016 TransVOS (val) Jaccard (Mean) 89.8 # 9
F-Score 91.2 # 12
J&F 90.5 # 10
Video Object Segmentation DAVIS 2017 (test-dev) STCN Jaccard 72.7 # 7
F-measure 79.6 # 7
Mean Jaccard & F-Measure 76.1 # 7
Video Object Segmentation DAVIS 2017 (test-dev) CFBI+ Jaccard 71.6 # 9
Mean Jaccard & F-Measure 75.6 # 8
Video Object Segmentation DAVIS 2017 (test-dev) RMN Jaccard 71.9 # 8
F-measure 78.1 # 9
Video Object Segmentation DAVIS 2017 (test-dev) KMN Jaccard 74.1 # 5
F-measure 80.3 # 6
Mean Jaccard & F-Measure 77.2 # 5
Video Object Segmentation DAVIS 2017 (test-dev) BATMAN Jaccard 78.4 # 1
F-measure 86.1 # 1
Mean Jaccard & F-Measure 82.2 # 1
Video Object Segmentation DAVIS 2017 (test-dev) TransVOS Jaccard 73 # 6
F-measure 80.9 # 5
Mean Jaccard & F-Measure 76.9 # 6
Video Object Segmentation DAVIS 2017 (test-dev) CFBI Jaccard 71.4 # 10
F-measure 78.7 # 8
Mean Jaccard & F-Measure 75 # 9
Video Object Segmentation DAVIS 2017 (test-dev) LCM Jaccard 74.4 # 4
F-measure 81.8 # 4
Mean Jaccard & F-Measure 78.1 # 4
Video Object Segmentation DAVIS 2017 (val) AOT Mean Jaccard & F-Measure 84.9 # 5
Jaccard 82.3 # 3
F-measure 87.5 # 5
Video Object Segmentation DAVIS 2017 (val) BATMAN Mean Jaccard & F-Measure 86.2 # 2
F-measure 89.3 # 3
Video Object Segmentation DAVIS 2017 (val) STCN Mean Jaccard & F-Measure 85.4 # 4
Jaccard 82.2 # 4
F-measure 88.6 # 4
Video Object Segmentation DAVIS 2017 (val) TransVOS Mean Jaccard & F-Measure 83.9 # 6
Jaccard 81.4 # 5
F-measure 86.4 # 7
Video Object Segmentation DAVIS 2017 (val) RPCMVOS Mean Jaccard & F-Measure 83.7 # 7
Jaccard 81.3 # 6
Video Object Segmentation DAVIS 2017 (val) LCM Jaccard 80.5 # 8
F-measure 86.5 # 6
Video Object Segmentation DAVIS 2017 (val) RMN Mean Jaccard & F-Measure 83.5 # 8
Jaccard 81 # 7
F-measure 86 # 8
Video Object Segmentation DAVIS 2017 (val) CFBI+ Mean Jaccard & F-Measure 82.9 # 9
Jaccard 80.1 # 9
F-measure 85.7 # 9
Video Object Segmentation DAVIS 2017 (val) KMN Mean Jaccard & F-Measure 82.8 # 10
Jaccard 80 # 10
F-measure 85.6 # 10
Video Object Segmentation DAVIS 2017 (val) SST Mean Jaccard & F-Measure 82.5 # 11
Jaccard 79.9 # 11
F-measure 85.1 # 11
Video Object Segmentation DAVIS 2017 (val) CFBI Mean Jaccard & F-Measure 81.9 # 12
Jaccard 79.3 # 12
F-measure 84.5 # 12
Video Object Segmentation DAVIS 2017 (val) STM Jaccard 79.2 # 13
F-measure 84.3 # 13
Video Object Segmentation DAVIS 2017 (val) LWL Mean Jaccard & F-Measure 81.6 # 13
Jaccard 79.1 # 14
F-measure 84.1 # 14
Video Object Segmentation DAVIS 2017 (val) AFB-URR Mean Jaccard & F-Measure 74.6 # 15
Jaccard 73 # 15
F-measure 76.1 # 16
Video Object Segmentation YouTube-VOS 2018 CFBI Jaccard (Seen) 81.1 # 11
F-Measure (Seen) 85.8 # 7
Visual Object Tracking YouTube-VOS 2018 KMN Jaccard (Unseen) 75.3 # 2
Visual Object Tracking YouTube-VOS 2018 CFBI F-Measure (Unseen) 83.4 # 1
Visual Object Tracking YouTube-VOS 2018 RMN Jaccard (Unseen) 75.7 # 1
Visual Object Tracking YouTube-VOS 2018 TransVOS F-Measure (Seen) 86.7 # 1
F-Measure (Unseen) 83.4 # 1
Video Object Segmentation YouTube-VOS 2018 TransVOS Jaccard (Seen) 82 # 6
Jaccard (Unseen) 75 # 11
F-Measure (Seen) 86.7 # 4
F-Measure (Unseen) 83.4 # 7
Mean Jaccard & F-Measure 81.8 # 7
Video Object Segmentation YouTube-VOS 2018 RMN Jaccard (Seen) 82.1 # 5
Jaccard (Unseen) 75.7 # 9
F-Measure (Seen) 85.7 # 8
F-Measure (Unseen) 82.4 # 10
Video Object Segmentation YouTube-VOS 2018 STM Jaccard (Seen) 79.7 # 14
Jaccard (Unseen) 72.8 # 13
F-Measure (Seen) 84.2 # 11
F-Measure (Unseen) 80.9 # 11
Mean Jaccard & F-Measure 79.4 # 12
Video Object Segmentation YouTube-VOS 2018 LWL Jaccard (Seen) 80.4 # 13
Jaccard (Unseen) 76.4 # 7
F-Measure (Seen) 84.9 # 10
F-Measure (Unseen) 84.4 # 6
Mean Jaccard & F-Measure 81.5 # 9
Video Object Segmentation YouTube-VOS 2018 AFB-URR Jaccard (Seen) 78.8 # 15
Jaccard (Unseen) 74.1 # 12
F-Measure (Seen) 83.1 # 12
F-Measure (Unseen) 82.6 # 9
Mean Jaccard & F-Measure 79.6 # 11
Video Object Segmentation YouTube-VOS 2018 KMN Jaccard (Seen) 81.4 # 9
Jaccard (Unseen) 75.3 # 10
F-Measure (Seen) 85.6 # 9
F-Measure (Unseen) 83.3 # 8
Mean Jaccard & F-Measure 81.4 # 10
Video Object Segmentation YouTube-VOS 2018 LCM Jaccard (Seen) 82.2 # 4
Mean Jaccard & F-Measure 82 # 6
Video Object Segmentation YouTube-VOS 2018 AOT Jaccard (Seen) 83.7 # 2
Jaccard (Unseen) 78.1 # 3
F-Measure (Seen) 88.5 # 2
F-Measure (Unseen) 86.1 # 3
Mean Jaccard & F-Measure 84.1 # 2
Video Object Segmentation YouTube-VOS 2018 RPCMVOS Jaccard (Seen) 83.1 # 3
Jaccard (Unseen) 78.5 # 2
F-Measure (Seen) 87.7 # 3
F-Measure (Unseen) 86.7 # 2
Mean Jaccard & F-Measure 84 # 3
Video Object Segmentation YouTube-VOS 2018 CFBI+ Jaccard (Seen) 81.8 # 8
Jaccard (Unseen) 77.1 # 5
F-Measure (Seen) 86.6 # 5
F-Measure (Unseen) 85.6 # 5
Mean Jaccard & F-Measure 82.8 # 5
Video Object Segmentation YouTube-VOS 2018 STCN Jaccard (Seen) 81.9 # 7
Jaccard (Unseen) 77.9 # 4
F-Measure (Seen) 86.5 # 6
F-Measure (Unseen) 85.7 # 4
Mean Jaccard & F-Measure 83 # 4
Video Object Segmentation YouTube-VOS 2018 SST Jaccard (Seen) 81.2 # 10
Jaccard (Unseen) 76 # 8
Mean Jaccard & F-Measure 81.7 # 8
Video Object Segmentation YouTube-VOS 2019 CFBI Mean Jaccard & F-Measure 81 # 10
Jaccard (Seen) 80.6 # 10
Jaccard (Unseen) 75.2 # 10
F-Measure (Seen) 85.1 # 9
F-Measure (Unseen) 83 # 9
Video Object Segmentation YouTube-VOS 2019 BATMAN Mean Jaccard & F-Measure 85 # 3
Jaccard (Seen) 84.5 # 2
Jaccard (Unseen) 79 # 4
F-Measure (Seen) 89.3 # 2
F-Measure (Unseen) 87.2 # 3

Methods