3D-LFM: Lifting Foundation Model

19 Dec 2023  ยท  Mosam Dabhi, Laszlo A. Jeni, Simon Lucey ยท

The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP) problems, but deep learning has expanded our capability to reconstruct a wide range of object classes (e.g. C3DPO and PAUL) with resilience to noise, occlusions, and perspective distortions. All these techniques, however, have been limited by the fundamental need to establish correspondences across the 3D training data -- significantly limiting their utility to applications where one has an abundance of "in-correspondence" 3D data. Our approach harnesses the inherent permutation equivariance of transformers to manage varying number of points per 3D data instance, withstands occlusions, and generalizes to unseen categories. We demonstrate state of the art performance across 2D-3D lifting task benchmarks. Since our approach can be trained across such a broad class of structures we refer to it simply as a 3D Lifting Foundation Model (3D-LFM) -- the first of its kind.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Hand Pose Estimation H3WB 3D-LFM Average MPJPE (mm) 28.22 # 1
3D Facial Landmark Localization H3WB 3D-LFM Average MPJPE (mm) 10.44 # 1
3D Human Pose Estimation H3WB 3D-LFM MPJPE 60.83 # 1
3D Human Pose Estimation Human3.6M 3D-LFM Average MPJPE (mm) 31.89 # 41
Using 2D ground-truth joints Yes # 2
Multi-View or Monocular Monocular # 1

Methods


No methods listed for this paper. Add relevant methods here