TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	absolute relative error	0.048	# 9
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	RMSE	1.966	# 9
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	Sq Rel	0.139	# 17
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	RMSE log	0.074	# 11
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	Delta < 1.25	0.979	# 10
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	Delta < 1.25^2	0.998	# 1
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	Delta < 1.25^3	1.000	# 1
Monocular Depth Estimation	NYU-Depth V2	ECoDepth	RMSE	0.218	# 4
Monocular Depth Estimation	NYU-Depth V2	ECoDepth	absolute relative error	0.059	# 5
Monocular Depth Estimation	NYU-Depth V2	ECoDepth	Delta < 1.25	0.978	# 5
Monocular Depth Estimation	NYU-Depth V2	ECoDepth	Delta < 1.25^2	0.997	# 3
Monocular Depth Estimation	NYU-Depth V2	ECoDepth	Delta < 1.25^3	0.999	# 4
Monocular Depth Estimation	NYU-Depth V2	ECoDepth	log 10	0.026	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ecodepth-effective-conditioning-of-diffusion/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=ecodepth-effective-conditioning-of-diffusion)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ecodepth-effective-conditioning-of-diffusion/monocular-depth-estimation-on-kitti-eigen)](https://paperswithcode.com/sota/monocular-depth-estimation-on-kitti-eigen?p=ecodepth-effective-conditioning-of-diffusion)`

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

27 Mar 2024 · Suraj Patni, Aradhye Agarwal, Chetan Arora ·

In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059 (14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The project page is available at https://ecodepth-iitd.github.io

PDF Abstract

Code

Add Remove Mark official

aradhye2002/ecodepth official

Tasks

Add Remove

Depth Estimation

Depth Prediction

Monocular Depth Estimation

Datasets

KITTI

NYUv2

SUN RGB-D

LAION-400M

Hypersim

DIODE

IBims-1

KITTI-Depth

Results from the Paper

Edit

Ranked #4 on Monocular Depth Estimation on NYU-Depth V2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Monocular Depth Estimation	KITTI Eigen split	ECoDepth	absolute relative error	0.048	# 9	Compare
			RMSE	1.966	# 9	Compare
			Sq Rel	0.139	# 17	Compare
			RMSE log	0.074	# 11	Compare
			Delta < 1.25	0.979	# 10	Compare
			Delta < 1.25^2	0.998	# 1	Compare
			Delta < 1.25^3	1.000	# 1	Compare
Monocular Depth Estimation	NYU-Depth V2	ECoDepth	RMSE	0.218	# 4	Compare
			absolute relative error	0.059	# 5	Compare
			Delta < 1.25	0.978	# 5	Compare
			Delta < 1.25^2	0.997	# 3	Compare
			Delta < 1.25^3	0.999	# 4	Compare
			log 10	0.026	# 5	Compare

Methods

Add Remove

Dense Connections • Diffusion • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Vision Transformer

Edit Social Preview

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove