DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

18 Sep 2023  ยท  Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou ยท

We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks. DFormer has two new key innovations: 1) Unlike previous works that encode RGB-D information with RGB pretrained backbone, we pretrain the backbone using image-depth pairs from ImageNet-1K, and hence the DFormer is endowed with the capacity to encode RGB-D representations; 2) DFormer comprises a sequence of RGB-D blocks, which are tailored for encoding both RGB and depth information through a novel building block design. DFormer avoids the mismatched encoding of the 3D geometry relationships in depth maps by RGB pretrained backbones, which widely lies in existing methods but has not been resolved. We finetune the pretrained DFormer on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient object detection, with a lightweight decoder head. Experimental results show that our DFormer achieves new state-of-the-art performance on these two tasks with less than half of the computational cost of the current best methods on two RGB-D semantic segmentation datasets and five RGB-D salient object detection datasets. Our code is available at: https://github.com/VCIP-RGBD/DFormer.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
RGB-D Salient Object Detection DES DFormer-L S-Measure 94.8 # 1
Average MAE 0.013 # 1
max E-Measure 98.0 # 1
max F-Measure 95.6 # 1
RGB-D Salient Object Detection NJU2K DFormer-L S-Measure 93.7 # 1
Average MAE 0.023 # 1
max E-Measure 96.4 # 1
max F-Measure 94.6 # 1
RGB-D Salient Object Detection NLPR DFormer-L S-Measure 94.2 # 1
Average MAE 0.016 # 1
max F-Measure 93.9 # 1
max E-Measure 97.1 # 1
Semantic Segmentation NYU Depth v2 DFormer-L Mean IoU 57.2% # 6
Semantic Segmentation NYU Depth v2 DFormer-T Mean IoU 51.8% # 38
Semantic Segmentation NYU Depth v2 DFormer-S Mean IoU 53.6% # 21
Semantic Segmentation NYU Depth v2 DFormer-B Mean IoU 55.6% # 13
RGB-D Salient Object Detection SIP DFormer-L S-Measure 91.5 # 1
max E-Measure 95.0 # 1
max F-Measure 93.8 # 1
Average MAE 0.032 # 1
RGB-D Salient Object Detection STERE DFormer-L S-Measure 92.3 # 1
Average MAE 0.030 # 1
max F-Measure 92.9 # 1
max E-Measure 95.2 # 1
Semantic Segmentation SUN-RGBD DFormer-L Mean IoU 52.5% # 3
Semantic Segmentation SUN-RGBD DFormer-B Mean IoU 51.2% # 7
Semantic Segmentation SUN-RGBD TokenFusion (S) Mean IoU 50.0% # 11
Semantic Segmentation SUN-RGBD FSFNet Mean IoU 48.8% # 19

Methods


No methods listed for this paper. Add relevant methods here