TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
RGB-D Salient Object Detection	DES	DFormer-L	S-Measure	94.8	# 1
RGB-D Salient Object Detection	DES	DFormer-L	Average MAE	0.013	# 1
RGB-D Salient Object Detection	DES	DFormer-L	max E-Measure	98.0	# 1
RGB-D Salient Object Detection	DES	DFormer-L	max F-Measure	95.6	# 1
RGB-D Salient Object Detection	NJU2K	DFormer-L	S-Measure	93.7	# 1
RGB-D Salient Object Detection	NJU2K	DFormer-L	Average MAE	0.023	# 1
RGB-D Salient Object Detection	NJU2K	DFormer-L	max E-Measure	96.4	# 1
RGB-D Salient Object Detection	NJU2K	DFormer-L	max F-Measure	94.6	# 1
RGB-D Salient Object Detection	NLPR	DFormer-L	S-Measure	94.2	# 1
RGB-D Salient Object Detection	NLPR	DFormer-L	Average MAE	0.016	# 1
RGB-D Salient Object Detection	NLPR	DFormer-L	max F-Measure	93.9	# 1
RGB-D Salient Object Detection	NLPR	DFormer-L	max E-Measure	97.1	# 1
Semantic Segmentation	NYU Depth v2	DFormer-L	Mean IoU	57.2%	# 6
Semantic Segmentation	NYU Depth v2	DFormer-T	Mean IoU	51.8%	# 38
Semantic Segmentation	NYU Depth v2	DFormer-S	Mean IoU	53.6%	# 21
Semantic Segmentation	NYU Depth v2	DFormer-B	Mean IoU	55.6%	# 13
RGB-D Salient Object Detection	SIP	DFormer-L	S-Measure	91.5	# 1
RGB-D Salient Object Detection	SIP	DFormer-L	max E-Measure	95.0	# 1
RGB-D Salient Object Detection	SIP	DFormer-L	max F-Measure	93.8	# 1
RGB-D Salient Object Detection	SIP	DFormer-L	Average MAE	0.032	# 1
RGB-D Salient Object Detection	STERE	DFormer-L	S-Measure	92.3	# 1
RGB-D Salient Object Detection	STERE	DFormer-L	Average MAE	0.030	# 1
RGB-D Salient Object Detection	STERE	DFormer-L	max F-Measure	92.9	# 1
RGB-D Salient Object Detection	STERE	DFormer-L	max E-Measure	95.2	# 1
Semantic Segmentation	SUN-RGBD	DFormer-L	Mean IoU	52.5%	# 3
Semantic Segmentation	SUN-RGBD	DFormer-B	Mean IoU	51.2%	# 7
Semantic Segmentation	SUN-RGBD	TokenFusion (S)	Mean IoU	50.0%	# 11
Semantic Segmentation	SUN-RGBD	FSFNet	Mean IoU	48.8%	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dformer-rethinking-rgbd-representation/rgb-d-salient-object-detection-on-des)](https://paperswithcode.com/sota/rgb-d-salient-object-detection-on-des?p=dformer-rethinking-rgbd-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dformer-rethinking-rgbd-representation/rgb-d-salient-object-detection-on-nju2k)](https://paperswithcode.com/sota/rgb-d-salient-object-detection-on-nju2k?p=dformer-rethinking-rgbd-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dformer-rethinking-rgbd-representation/rgb-d-salient-object-detection-on-nlpr)](https://paperswithcode.com/sota/rgb-d-salient-object-detection-on-nlpr?p=dformer-rethinking-rgbd-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dformer-rethinking-rgbd-representation/rgb-d-salient-object-detection-on-sip)](https://paperswithcode.com/sota/rgb-d-salient-object-detection-on-sip?p=dformer-rethinking-rgbd-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dformer-rethinking-rgbd-representation/rgb-d-salient-object-detection-on-stere)](https://paperswithcode.com/sota/rgb-d-salient-object-detection-on-stere?p=dformer-rethinking-rgbd-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dformer-rethinking-rgbd-representation/semantic-segmentation-on-sun-rgbd)](https://paperswithcode.com/sota/semantic-segmentation-on-sun-rgbd?p=dformer-rethinking-rgbd-representation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/dformer-rethinking-rgbd-representation/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=dformer-rethinking-rgbd-representation)`

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

18 Sep 2023 · Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou ·

We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks. DFormer has two new key innovations: 1) Unlike previous works that encode RGB-D information with RGB pretrained backbone, we pretrain the backbone using image-depth pairs from ImageNet-1K, and hence the DFormer is endowed with the capacity to encode RGB-D representations; 2) DFormer comprises a sequence of RGB-D blocks, which are tailored for encoding both RGB and depth information through a novel building block design. DFormer avoids the mismatched encoding of the 3D geometry relationships in depth maps by RGB pretrained backbones, which widely lies in existing methods but has not been resolved. We finetune the pretrained DFormer on two popular RGB-D tasks, i.e., RGB-D semantic segmentation and RGB-D salient object detection, with a lightweight decoder head. Experimental results show that our DFormer achieves new state-of-the-art performance on these two tasks with less than half of the computational cost of the current best methods on two RGB-D semantic segmentation datasets and five RGB-D salient object detection datasets. Our code is available at: https://github.com/VCIP-RGBD/DFormer.

PDF Abstract

Code

Add Remove Mark official

VCIP-RGBD/DFormer

111

Tasks

Add Remove

object-detection

Object Detection

Representation Learning

RGB-D Salient Object Detection

RGBD Semantic Segmentation

Salient Object Detection

Segmentation

Semantic Segmentation

Datasets

NYUv2 ImageNet-1K

SUN RGB-D

KITTI-360

NLPR MFNet

SIP

NJU2K

Results from the Paper

Add Remove

Ranked #1 on RGB-D Salient Object Detection on DES

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
RGB-D Salient Object Detection	DES	DFormer-L	S-Measure	94.8	# 1	Compare
			Average MAE	0.013	# 1	Compare
			max E-Measure	98.0	# 1	Compare
			max F-Measure	95.6	# 1	Compare
RGB-D Salient Object Detection	NJU2K	DFormer-L	S-Measure	93.7	# 1	Compare
			Average MAE	0.023	# 1	Compare
			max E-Measure	96.4	# 1	Compare
			max F-Measure	94.6	# 1	Compare
RGB-D Salient Object Detection	NLPR	DFormer-L	S-Measure	94.2	# 1	Compare
			Average MAE	0.016	# 1	Compare
			max F-Measure	93.9	# 1	Compare
			max E-Measure	97.1	# 1	Compare
Semantic Segmentation	NYU Depth v2	DFormer-L	Mean IoU	57.2%	# 6	Compare
Semantic Segmentation	NYU Depth v2	DFormer-T	Mean IoU	51.8%	# 38	Compare
Semantic Segmentation	NYU Depth v2	DFormer-S	Mean IoU	53.6%	# 21	Compare
Semantic Segmentation	NYU Depth v2	DFormer-B	Mean IoU	55.6%	# 13	Compare
RGB-D Salient Object Detection	SIP	DFormer-L	S-Measure	91.5	# 1	Compare
			max E-Measure	95.0	# 1	Compare
			max F-Measure	93.8	# 1	Compare
			Average MAE	0.032	# 1	Compare
RGB-D Salient Object Detection	STERE	DFormer-L	S-Measure	92.3	# 1	Compare
			Average MAE	0.030	# 1	Compare
			max F-Measure	92.9	# 1	Compare
			max E-Measure	95.2	# 1	Compare
Semantic Segmentation	SUN-RGBD	DFormer-L	Mean IoU	52.5%	# 3	Compare
Semantic Segmentation	SUN-RGBD	DFormer-B	Mean IoU	51.2%	# 7	Compare
Semantic Segmentation	SUN-RGBD	TokenFusion (S)	Mean IoU	50.0%	# 11	Compare
Semantic Segmentation	SUN-RGBD	FSFNet	Mean IoU	48.8%	# 19	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove