Unsupervised Depth Completion with Calibrated Backprojection Layers

ICCV 2021  ·  Alex Wong, Stefano Soatto ·

We propose a deep neural network architecture to infer dense depth from an image and a sparse point cloud. It is trained using a video stream and corresponding synchronized sparse point cloud, as obtained from a LIDAR or other range sensor, along with the intrinsic calibration parameters of the camera. At inference time, the calibration of the camera, which can be different than the one used for training, is fed as an input to the network along with the sparse point cloud and a single image. A Calibrated Backprojection Layer backprojects each pixel in the image to three-dimensional space using the calibration matrix and a depth feature descriptor. The resulting 3D positional encoding is concatenated with the image descriptor and the previous layer output to yield the input to the next layer of the encoder. A decoder, exploiting skip-connections, produces a dense depth map. The resulting Calibrated Backprojection Network, or KBNet, is trained without supervision by minimizing the photometric reprojection error. KBNet imputes missing depth value based on the training set, rather than on generic regularization. We test KBNet on public depth completion benchmarks, where it outperforms the state of the art by 30.5% indoor and 8.8% outdoor when the same camera is used for training and testing. When the test camera is different, the improvement reaches 62%. Code available at: https://github.com/alexklwong/calibrated-backprojection-network.

PDF Abstract ICCV 2021 PDF ICCV 2021 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Depth Completion KITTI Depth Completion KBNet iRMSE 2.95 # 5
iMAE 1.02 # 5
RMSE 1069.47 # 12
MAE 256.76 # 9
Runtime [ms] 16 # 3
Depth Completion VOID KBNet MAE 39.80 # 2
RMSE 95.86 # 2
iMAE 21.16 # 2
iRMSE 49.72 # 2

Methods


No methods listed for this paper. Add relevant methods here