Monocular Depth Estimation
339 papers with code • 17 benchmarks • 26 datasets
Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.
Libraries
Use these libraries to find Monocular Depth Estimation models and implementationsDatasets
Latest papers
METER: a mobile vision transformer architecture for monocular depth estimation
State of the art MDE models typically rely on vision transformers (ViT) architectures that are highly deep and complex, making them unsuitable for fast inference on devices with hardware constraints.
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings.
D4D: An RGBD diffusion model to boost monocular depth estimation
Ground-truth RGBD data are fundamental for a wide range of computer vision applications; however, those labeled samples are difficult to collect and time-consuming to produce.
Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation
This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation.
Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving
Collectively, these contributions lay a robust foundation for the widespread adoption of vision-based 3D perception technologies in autonomous driving applications.
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV
Self-supervised learning is the key to unlocking generic computer vision systems.
TIE-KD: Teacher-Independent and Explainable Knowledge Distillation for Monocular Depth Estimation
Monocular depth estimation (MDE) is essential for numerous applications yet is impeded by the substantial computational demands of accurate deep learning models.
Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting
In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes.
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error.
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
In this work, we study the fine-tuned performance of models with ResNet50 and ViT-B backbones pretrained in self-supervised and supervised manners with ImageNet-1k and Hyperkvasir-unlabelled (self-supervised only) in a range of GIE vision tasks.