Depth Estimation
777 papers with code • 13 benchmarks • 70 datasets
Depth Estimation is the task of measuring the distance of each pixel relative to the camera. Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images. Newer methods can directly estimate depth by minimizing the regression loss, or by learning to generate a novel view from a sequence. The most popular benchmarks are KITTI and NYUv2. Models are typically evaluated according to a RMS metric.
Libraries
Use these libraries to find Depth Estimation models and implementationsSubtasks
Latest papers
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks.
When Do We Not Need Larger Vision Models?
Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models.
FeatUp: A Model-Agnostic Framework for Features at Any Resolution
Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime.
Robust Shape Fitting for 3D Scene Abstraction
A RANSAC estimator guided by a neural network fits these primitives to a depth map.
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images
This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera.
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model
Third, to reduce the reliance on massive training data, we propose a ``divide and conquer" solution.
METER: a mobile vision transformer architecture for monocular depth estimation
State of the art MDE models typically rely on vision transformers (ViT) architectures that are highly deep and complex, making them unsuitable for fast inference on devices with hardware constraints.
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
In this work, we propose a new robustness benchmark to evaluate the depth estimation system under various noisy pose settings.
D4D: An RGBD diffusion model to boost monocular depth estimation
Ground-truth RGBD data are fundamental for a wide range of computer vision applications; however, those labeled samples are difficult to collect and time-consuming to produce.
Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation
This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation.