TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Thermal Image Segmentation	MFN Dataset	DPLNet	mIOU	59.3	# 5
Semantic Segmentation	NYU Depth v2	DPLNet	Mean IoU	59.3	# 2
Thermal Image Segmentation	PST900	DPLNet	mIoU	86.7	# 4
Semantic Segmentation	SUN-RGBD	DPLNet	Mean IoU	52.8%	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-multimodal-semantic-segmentation/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=efficient-multimodal-semantic-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-multimodal-semantic-segmentation/semantic-segmentation-on-sun-rgbd)](https://paperswithcode.com/sota/semantic-segmentation-on-sun-rgbd?p=efficient-multimodal-semantic-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-multimodal-semantic-segmentation/thermal-image-segmentation-on-pst900)](https://paperswithcode.com/sota/thermal-image-segmentation-on-pst900?p=efficient-multimodal-semantic-segmentation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/efficient-multimodal-semantic-segmentation/thermal-image-segmentation-on-mfn-dataset)](https://paperswithcode.com/sota/thermal-image-segmentation-on-mfn-dataset?p=efficient-multimodal-semantic-segmentation)`

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

1 Dec 2023 · Shaohua Dong, Yunhe Feng, Qing Yang, Yan Huang, Dongfang Liu, Heng Fan ·

Multimodal (e.g., RGB-Depth/RGB-Thermal) fusion has shown great potential for improving semantic segmentation in complex scenes (e.g., indoor/low-light conditions). Existing approaches often fully fine-tune a dual-branch encoder-decoder framework with a complicated feature fusion strategy for achieving multimodal semantic segmentation, which is training-costly due to the massive parameter updates in feature extraction and fusion. To address this issue, we propose a surprisingly simple yet effective dual-prompt learning network (dubbed DPLNet) for training-efficient multimodal (e.g., RGB-D/T) semantic segmentation. The core of DPLNet is to directly adapt a frozen pre-trained RGB model to multimodal semantic segmentation, reducing parameter updates. For this purpose, we present two prompt learning modules, comprising multimodal prompt generator (MPG) and multimodal feature adapter (MFA). MPG works to fuse the features from different modalities in a compact manner and is inserted from shadow to deep stages to generate the multi-level multimodal prompts that are injected into the frozen backbone, while MPG adapts prompted multimodal features in the frozen backbone for better multimodal semantic segmentation. Since both the MPG and MFA are lightweight, only a few trainable parameters (3.88M, 4.4% of the pre-trained backbone parameters) are introduced for multimodal feature fusion and learning. Using a simple decoder (3.27M parameters), DPLNet achieves new state-of-the-art performance or is on a par with other complex approaches on four RGB-D/T semantic segmentation datasets while satisfying parameter efficiency. Moreover, we show that DPLNet is general and applicable to other multimodal tasks such as salient object detection and video semantic segmentation. Without special design, DPLNet outperforms many complicated models. Our code will be available at github.com/ShaohuaDong2021/DPLNet.

PDF Abstract

Code

Add Remove Mark official

shaohuadong2021/dplnet official

Tasks

Add Remove

object-detection

Object Detection

RGBD Semantic Segmentation

Salient Object Detection

Segmentation

Semantic Segmentation

Thermal Image Segmentation

Video Semantic Segmentation

Datasets

NYUv2

SUN RGB-D

NLPR

LFSD MFNet

SIP

NJU2K

PST900 VT5000

Results from the Paper

Edit

Ranked #2 on Semantic Segmentation on SUN-RGBD (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Thermal Image Segmentation	MFN Dataset	DPLNet	mIOU	59.3	# 5	Compare
Semantic Segmentation	NYU Depth v2	DPLNet	Mean IoU	59.3	# 2	Compare
Thermal Image Segmentation	PST900	DPLNet	mIoU	86.7	# 4	Compare
Semantic Segmentation	SUN-RGBD	DPLNet	Mean IoU	52.8%	# 2	Compare

Methods

Add Remove

Adapter

Edit Social Preview

Efficient Multimodal Semantic Segmentation via Dual-Prompt Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove