Monocular Depth Estimation
328 papers with code • 18 benchmarks • 26 datasets
Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.
Libraries
Use these libraries to find Monocular Depth Estimation models and implementationsDatasets
Most implemented papers
DINOv2: Learning Robust Visual Features without Supervision
The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.
Index Network
By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.
Deep Ordinal Regression Network for Monocular Depth Estimation
These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions.
Unsupervised Monocular Depth Learning in Dynamic Scenes
We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision.
Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling.
Fast Robust Monocular Depth Estimation for Obstacle Detection with Fully Convolutional Networks
We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (~300Hz), without making assumptions on the type of motion.
Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries
Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries.
Towards real-time unsupervised monocular depth estimation on CPU
To tackle this issue, in this paper we propose a novel architecture capable to quickly infer an accurate depth map on a CPU, even of an embedded system, using a pyramid of features extracted from a single input image.
Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations
Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards.
Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells
While most results in this domain have been achieved on image classification and language modelling problems, here we concentrate on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks.