Depth Estimation
799 papers with code • 14 benchmarks • 70 datasets
Depth Estimation is the task of measuring the distance of each pixel relative to the camera. Depth is extracted from either monocular (single) or stereo (multiple views of a scene) images. Traditional methods use multi-view geometry to find the relationship between the images. Newer methods can directly estimate depth by minimizing the regression loss, or by learning to generate a novel view from a sequence. The most popular benchmarks are KITTI and NYUv2. Models are typically evaluated according to a RMS metric.
Libraries
Use these libraries to find Depth Estimation models and implementationsSubtasks
Latest papers
UniDepth: Universal Monocular Metric Depth Estimation
However, the remarkable accuracy of recent MMDE methods is confined to their training domains.
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings.
ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time.
DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing
3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Deep learning-based monocular depth estimation (MDE), extensively applied in autonomous driving, is known to be vulnerable to adversarial attacks.
When Do We Not Need Larger Vision Models?
Our results show that a multi-scale smaller model has comparable learning capacity to a larger model, and pre-training smaller models with S$^2$ can match or even exceed the advantage of larger models.
FeatUp: A Model-Agnostic Framework for Features at Any Resolution
Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime.
Robust Shape Fitting for 3D Scene Abstraction
A RANSAC estimator guided by a neural network fits these primitives to a depth map.
SwinMTL: A Shared Architecture for Simultaneous Depth Estimation and Semantic Segmentation from Monocular Camera Images
This research paper presents an innovative multi-task learning framework that allows concurrent depth estimation and semantic segmentation using a single camera.
SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model
Third, to reduce the reliance on massive training data, we propose a ``divide and conquer" solution.