Monocular Depth Estimation
338 papers with code • 18 benchmarks • 26 datasets
Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.
Libraries
Use these libraries to find Monocular Depth Estimation models and implementationsDatasets
Latest papers
Virtually Enriched NYU Depth V2 Dataset for Monocular Depth Estimation: Do We Need Artificial Augmentation?
We present ANYU, a new virtually augmented version of the NYU depth v2 dataset, designed for monocular depth estimation.
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Recent advances in monocular depth estimation have been made by incorporating natural language as additional guidance.
RoadBEV: Road Surface Reconstruction in Bird's Eye View
This paper uniformly proposes two simple yet effective models for road elevation reconstruction in BEV named RoadBEV-mono and RoadBEV-stereo, which estimate road elevation with monocular and stereo images, respectively.
WorDepth: Variational Language Prior for Monocular Depth Estimation
To test this, we focus on monocular depth estimation, the problem of predicting a dense depth map from a single image, but with an additional text caption describing the scene.
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
In the auto-labeling stage, we represent the surface of each instance as a signed distance field (SDF) and render its silhouette as an instance mask through our proposed instance-aware volumetric silhouette rendering.
UniDepth: Universal Monocular Metric Depth Estimation
However, the remarkable accuracy of recent MMDE methods is confined to their training domains.
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks.
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images
This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera.
METER: a mobile vision transformer architecture for monocular depth estimation
State of the art MDE models typically rely on vision transformers (ViT) architectures that are highly deep and complex, making them unsuitable for fast inference on devices with hardware constraints.