Monocular Depth Estimation
339 papers with code • 18 benchmarks • 26 datasets
Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.
Libraries
Use these libraries to find Monocular Depth Estimation models and implementationsDatasets
Latest papers with no code
$\mathrm{F^2Depth}$: Self-supervised Indoor Monocular Depth Estimation via Optical Flow Consistency and Feature Map Synthesis
To evaluate the generalization ability of our $\mathrm{F^2Depth}$, we collect a Campus Indoor depth dataset composed of approximately 1500 points selected from 99 images in 18 scenes.
Track Everything Everywhere Fast and Robustly
We propose a novel test-time optimization approach for efficiently and robustly tracking any pixel at any time in a video.
Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
Monocular depth estimation in endoscopy videos can enable assistive and robotic surgery to obtain better coverage of the organ and detection of various health issues.
Language-Based Depth Hints for Monocular Depth Estimation
In this work, we demonstrate the use of natural language as a source of an explicit prior about the structure of the world.
DepthFM: Fast Monocular Depth Estimation with Flow Matching
Due to the generative nature of our approach, our model reliably predicts the confidence of its depth estimates.
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training.
SSAP: A Shape-Sensitive Adversarial Patch for Comprehensive Disruption of Monocular Depth Estimation in Autonomous Navigation Applications
In this paper, we introduce SSAP (Shape-Sensitive Adversarial Patch), a novel approach designed to comprehensively disrupt monocular depth estimation (MDE) in autonomous navigation applications.
Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting
Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene.
DD-VNB: A Depth-based Dual-Loop Framework for Real-time Visually Navigated Bronchoscopy
Specifically, the relative pose changes are fed into the registration process as the initial guess to boost its accuracy and speed.
Pyramid Feature Attention Network for Monocular Depth Prediction
Deep convolutional neural networks (DCNNs) have achieved great success in monocular depth estimation (MDE).