TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Monocular Depth Estimation	Cityscapes	SwinMTL	Absolute relative error (AbsRel)	0.089	# 1
Monocular Depth Estimation	Cityscapes	SwinMTL	RMSE	5.481	# 1
Monocular Depth Estimation	Cityscapes	SwinMTL	RMSE log	0.139	# 1
Monocular Depth Estimation	Cityscapes	SwinMTL	Square relative error (SqRel)	1.051	# 1
Semantic Segmentation	NYU Depth v2	SwinMTL	Mean IoU	58.14%	# 4
Multi-Task Learning	NYUv2	SwinMTL	Mean IoU	58.14	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swinmtl-a-shared-architecture-for/monocular-depth-estimation-on-cityscapes)](https://paperswithcode.com/sota/monocular-depth-estimation-on-cityscapes?p=swinmtl-a-shared-architecture-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swinmtl-a-shared-architecture-for/multi-task-learning-on-nyuv2)](https://paperswithcode.com/sota/multi-task-learning-on-nyuv2?p=swinmtl-a-shared-architecture-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swinmtl-a-shared-architecture-for/semantic-segmentation-on-nyu-depth-v2)](https://paperswithcode.com/sota/semantic-segmentation-on-nyu-depth-v2?p=swinmtl-a-shared-architecture-for)`

SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images

15 Mar 2024 · Pardis Taghavi, Reza Langari, Gaurav Pandey ·

This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera. The proposed approach is based on a shared encoder-decoder architecture, which integrates various techniques to improve the accuracy of the depth estimation and semantic segmentation task without compromising computational efficiency. Additionally, the paper incorporates an adversarial training component, employing a Wasserstein GAN framework with a critic network, to refine model's predictions. The framework is thoroughly evaluated on two datasets - the outdoor Cityscapes dataset and the indoor NYU Depth V2 dataset - and it outperforms existing state-of-the-art methods in both segmentation and depth estimation tasks. We also conducted ablation studies to analyze the contributions of different components, including pre-training strategies, the inclusion of critics, the use of logarithmic depth scaling, and advanced image augmentations, to provide a better understanding of the proposed framework. The accompanying source code is accessible at \url{https://github.com/PardisTaghavi/SwinMTL}.

PDF Abstract

Code

Add Remove Mark official

pardistaghavi/swinmtl official

Tasks

Add Remove

Computational Efficiency

Depth Estimation

Monocular Depth Estimation

Multi-Task Learning

Segmentation

Semantic Segmentation

Datasets

Cityscapes

NYUv2

Results from the Paper

Add Remove

Ranked #1 on Monocular Depth Estimation on Cityscapes

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Monocular Depth Estimation	Cityscapes	SwinMTL	Absolute relative error (AbsRel)	0.089	# 1	Compare
			RMSE	5.481	# 1	Compare
			RMSE log	0.139	# 1	Compare
			Square relative error (SqRel)	1.051	# 1	Compare
Semantic Segmentation	NYU Depth v2	SwinMTL	Mean IoU	58.14%	# 4	Compare
Multi-Task Learning	NYUv2	SwinMTL	Mean IoU	58.14	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove