Monocular Depth Estimation

328 papers with code • 18 benchmarks • 26 datasets

Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.

Source: Defocus Deblurring Using Dual-Pixel Data

Libraries

Use these libraries to find Monocular Depth Estimation models and implementations

Most implemented papers

DINOv2: Learning Robust Visual Features without Supervision

facebookresearch/dinov2 14 Apr 2023

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision.

Index Network

poppinace/indexnet_matting 11 Aug 2019

By viewing the indices as a function of the feature map, we introduce the concept of "learning to index", and present a novel index-guided encoder-decoder framework where indices are self-learned adaptively from data and are used to guide the downsampling and upsampling stages, without extra training supervision.

Deep Ordinal Regression Network for Monocular Depth Estimation

hufu6371/DORN CVPR 2018

These methods model depth estimation as a regression problem and train the regression networks by minimizing mean squared error, which suffers from slow convergence and unsatisfactory local solutions.

Unsupervised Monocular Depth Learning in Dynamic Scenes

google-research/google-research 30 Oct 2020

We present a method for jointly training the estimation of depth, ego-motion, and a dense 3D translation field of objects relative to the scene, with monocular photometric consistency being the sole source of supervision.

Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

yhlleo/DeepSegmentor ICCV 2015

In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling.

Fast Robust Monocular Depth Estimation for Obstacle Detection with Fully Convolutional Networks

fangchangma/sparse-to-dense.pytorch 21 Jul 2016

We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (~300Hz), without making assumptions on the type of motion.

Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries

JunjH/Revisiting_Single_Depth_Estimation 23 Mar 2018

Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries.

Towards real-time unsupervised monocular depth estimation on CPU

mattpoggi/pydnet 29 Jun 2018

To tackle this issue, in this paper we propose a novel architecture capable to quickly infer an accurate depth map on a CPU, even of an embedded system, using a pyramid of features extracted from a single input image.

Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations

DrSleep/multi-task-refinenet 13 Sep 2018

Deployment of deep learning models in robotics as sensory information extractors can be a daunting task to handle, even using generic GPU cards.

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

drsleep/nas-segm-pytorch CVPR 2019

While most results in this domain have been achieved on image classification and language modelling problems, here we concentrate on dense per-pixel tasks, in particular, semantic image segmentation using fully convolutional networks.