Monocular Depth Estimation

339 papers with code • 17 benchmarks • 26 datasets

Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.

Source: Defocus Deblurring Using Dual-Pixel Data

Benchmarks

Add a Result

These leaderboards are used to track progress in Monocular Depth Estimation

Dataset	Best Model	Compare
NYU-Depth V2	Metric3Dv2(L, FT)	See all
KITTI Eigen split	Metric3Dv2 (g2, FT, 80m, flip_aug_test)	See all
KITTI Eigen split unsupervised	SQLdepth (ConvNeXt-L)	See all
NYU-Depth V2 self-supervised	IndoorDepth	See all
Mid-Air Dataset	M4Depth+U	See all
Make3D	GCNDepth	See all
IBims-1	Miangoleh et al. (SGR)	See all
DDAD	AFNet	See all
VA (Virtual Apartment)	DistDepth	See all
Middlebury 2014	Miangoleh et al. (MiDaS)	See all
KITTI	MonoViT	See all
SUN-RGBD	RPSF	See all
Cityscapes	SwinMTL	See all
UASOL	FCRN-DepthPrediction from Iro Laina et al. (2016)	See all
KITTI Object Tracking Evaluation 2012	PackNet-SfM	See all
Matterport3D	NeWCRFs	See all
Cityscapes 3D	TaskPrompter	See all

Show all 17 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Monocular Depth Estimation models and implementations

huggingface/transformers

3 papers

125,059

SeokjuLee/Insta-DM

3 papers

221

ShuweiShao/NDDepth

3 papers

Datasets

Latest papers

Most implemented Social Latest No code

METER: a mobile vision transformer architecture for monocular depth estimation

lorenzopapa5/meter • • 13 Mar 2024

State of the art MDE models typically rely on vision transformers (ViT) architectures that are highly deep and complex, making them unsuitable for fast inference on devices with hardware constraints.

13 Mar 2024

Paper
Code

Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving

junda24/afnet • • 12 Mar 2024

In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings.

12 Mar 2024

Paper
Code

D4D: An RGBD diffusion model to boost monocular depth estimation

lorenzopapa5/diffusion4d • 12 Mar 2024

Ground-truth RGBD data are fundamental for a wide range of computer vision applications; however, those labeled samples are difficult to collect and time-consuming to produce.

12 Mar 2024

Paper
Code

Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation

hitcslj/ssd • 8 Mar 2024

This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation.

08 Mar 2024

Paper
Code

Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving

owen-liuyuxuan/visionfactory • • 4 Mar 2024

Collectively, these contributions lay a robust foundation for the widespread adoption of vision-based 3D perception technologies in autonomous driving applications.

04 Mar 2024

Paper
Code

Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV

jspenmar/slowtv_monodepth • • 3 Mar 2024

Self-supervised learning is the key to unlocking generic computer vision systems.

03 Mar 2024

Paper
Code

TIE-KD: Teacher-Independent and Explainable Knowledge Distillation for Monocular Depth Estimation

hpc-lab-koreatech/tie-kd • • 22 Feb 2024

Monocular depth estimation (MDE) is essential for numerous applications yet is impeded by the substantial computational demands of accurate deep learning models.

22 Feb 2024

Paper
Code

Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting

hustvl/4DGaussians • • 29 Jan 2024

In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes.

1,665

29 Jan 2024

Paper
Code

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

LiheYoung/Depth-Anything • • 19 Jan 2024

To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.

5,668

19 Jan 2024

Paper
Code

A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy

esandml/ssl4gie • • 11 Jan 2024

In this work, we study the fine-tuned performance of models with ResNet50 and ViT-B backbones pretrained in self-supervised and supervised manners with ImageNet-1k and Hyperkvasir-unlabelled (self-supervised only) in a range of GIE vision tasks.

11 Jan 2024

Paper
Code

Monocular Depth Estimation

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result