Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our project page is at https://JUGGHM.github.io/Metric3Dv2.

PDF Abstract Under review 2024 PDF Under review 2024 Abstract

Results from the Paper


 Ranked #1 on Surface Normals Estimation on NYU Depth v2 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Surface Normals Estimation IBims-1 Metric3Dv2(g2, ZS) % < 11.25 69.7 # 1
% < 22.5 76.2 # 1
% < 30 78.8 # 1
Mean 19.6 # 1
Monocular Depth Estimation IBims-1 Metric3D-v2(L, ZS) δ1.25 0.969 # 3
Monocular Depth Estimation KITTI Eigen split Metric3Dv2 (g2, FT, 80m, flip_aug_test) absolute relative error 0.039 # 1
RMSE 1.766 # 5
RMSE log 0.060 # 2
Delta < 1.25 0.989 # 1
Delta < 1.25^2 0.998 # 1
Delta < 1.25^3 1.000 # 1
Surface Normals Estimation NYU Depth v2 Metric3Dv2(L, FT) % < 11.25 68.8 # 1
% < 22.5 84.9 # 1
% < 30 89.8 # 1
Mean Angle Error 12.0 # 1
RMSE 19.2 # 1
Monocular Depth Estimation NYU-Depth V2 Metric3Dv2(L, FT) RMSE 0.183 # 1
absolute relative error 0.047 # 1
Delta < 1.25 0.989 # 1
Delta < 1.25^2 0.998 # 1
Delta < 1.25^3 1.000 # 1
log 10 0.020 # 1
Surface Normals Estimation ScanNetV2 Metric3Dv2 (g2, In-domain) % < 11.25 77.8 # 1
% < 22.5 90.1 # 1
% < 30 93.5 # 1
Mean Angle Error 9.2 # 1

Methods


No methods listed for this paper. Add relevant methods here