TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	NYU Depth v2	AsymFormer	Mean IoU	54.1%	# 19
Real-Time Semantic Segmentation	NYU Depth v2	AsymFormer	mIoU	54.1	# 1
Real-Time Semantic Segmentation	NYU Depth v2	AsymFormer	Speed(ms/f)	15.3	# 2
Real-Time Semantic Segmentation	NYU Depth v2	AsymFormer	Speed (FPS)	65.5 (3090)	# 1
Semantic Segmentation	SUN-RGBD	DFormer-B	Mean IoU	49.1%	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asymformer-asymmetrical-cross-modal/real-time-semantic-segmentation-on-nyu-depth-1)](https://paperswithcode.com/sota/real-time-semantic-segmentation-on-nyu-depth-1?p=asymformer-asymmetrical-cross-modal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asymformer-asymmetrical-cross-modal/semantic-segmentation-on-sun-rgbd)](https://paperswithcode.com/sota/semantic-segmentation-on-sun-rgbd?p=asymformer-asymmetrical-cross-modal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/asymformer-asymmetrical-cross-modal/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=asymformer-asymmetrical-cross-modal)`

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation

25 Sep 2023 · Siqi Du, Weixi Wang, Renzhong Guo, Ruisheng Wang, Yibin Tian, Shengjun Tang ·

Understanding indoor scenes is crucial for urban studies. Considering the dynamic nature of indoor environments, effective semantic segmentation requires both real-time operation and high accuracy.To address this, we propose AsymFormer, a novel network that improves real-time semantic segmentation accuracy using RGB-D multi-modal information without substantially increasing network complexity. AsymFormer uses an asymmetrical backbone for multimodal feature extraction, reducing redundant parameters by optimizing computational resource distribution. To fuse asymmetric multimodal features, a Local Attention-Guided Feature Selection (LAFS) module is used to selectively fuse features from different modalities by leveraging their dependencies. Subsequently, a Cross-Modal Attention-Guided Feature Correlation Embedding (CMA) module is introduced to further extract cross-modal representations. The AsymFormer demonstrates competitive results with 54.1% mIoU on NYUv2 and 49.1% mIoU on SUNRGBD. Notably, AsymFormer achieves an inference speed of 65 FPS (79 FPS after implementing mixed precision quantization) on RTX3090, demonstrating that AsymFormer can strike a balance between high accuracy and efficiency.

PDF Abstract

Code

Add Remove Mark official

Fourier7754/AsymFormer official

Tasks

Add Remove

Computational Efficiency

Feature Correlation

feature selection

Quantization

Real-Time Semantic Segmentation

Representation Learning

Segmentation

Semantic Segmentation

Datasets

ImageNet

NYUv2

SUN RGB-D

Results from the Paper

Edit

Ranked #1 on Real-Time Semantic Segmentation on NYU Depth v2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	NYU Depth v2	AsymFormer	Mean IoU	54.1%	# 19	Compare
Real-Time Semantic Segmentation	NYU Depth v2	AsymFormer	mIoU	54.1	# 1	Compare
			Speed(ms/f)	15.3	# 2	Compare
			Speed (FPS)	65.5 (3090)	# 1	Compare
Semantic Segmentation	SUN-RGBD	DFormer-B	Mean IoU	49.1%	# 17	Compare

Methods

Add Remove

Feature Selection • SPEED

Edit Social Preview

AsymFormer: Asymmetrical Cross-Modal Representation Learning for Mobile Platform Real-Time RGB-D Semantic Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove