Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Neural networks have shown great abilities in estimating depth from a single image. However, the inferred depth maps are well below one-megapixel resolution and often lack fine-grained details, which limits their practicality. Our method builds on our analysis on how the input resolution and the scene structure affects depth estimation performance. We demonstrate that there is a trade-off between a consistent scene structure and the high-frequency details, and merge low- and high-resolution estimations to take advantage of this duality using a simple depth merging network. We present a double estimation method that improves the whole-image depth estimation and a patch selection method that adds local details to the final result. We demonstrate that by merging estimations at different resolutions with changing context, we can generate multi-megapixel depth maps with a high level of detail using a pre-trained model.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Monocular Depth Estimation IBims-1 Miangoleh et al. (SGR) ORD 0.3938 # 1
D3R 0.3222 # 1
RMSE 0.1598 # 1
δ1.25 0.6390 # 1
Monocular Depth Estimation IBims-1 Miangoleh et al. (MiDaS) ORD 0.5538 # 2
D3R 0.4671 # 2
RMSE 0.1965 # 2
δ1.25 0.7460 # 2
Monocular Depth Estimation Middlebury 2014 Miangoleh et al. (MiDaS) ORD 0.3467 # 1
D3R 0.1578 # 1
RMSE 0.1557 # 1
δ1.25 0.7406 # 1
Monocular Depth Estimation Middlebury 2014 Miangoleh et al. (SGR) ORD 0.3879 # 2
D3R 0.2324 # 2
RMSE 0.1973 # 2
δ1.25 0.7891 # 2

Methods


No methods listed for this paper. Add relevant methods here