Sampling is Matter: Point-guided 3D Human Mesh Reconstruction

This paper presents a simple yet powerful method for 3D human mesh reconstruction from a single RGB image. Most recently, the non-local interactions of the whole mesh vertices have been effectively estimated in the transformer while the relationship between body parts also has begun to be handled via the graph model. Even though those approaches have shown the remarkable progress in 3D human mesh reconstruction, it is still difficult to directly infer the relationship between features, which are encoded from the 2D input image, and 3D coordinates of each vertex. To resolve this problem, we propose to design a simple feature sampling scheme. The key idea is to sample features in the embedded space by following the guide of points, which are estimated as projection results of 3D mesh vertices (i.e., ground truth). This helps the model to concentrate more on vertex-relevant features in the 2D space, thus leading to the reconstruction of the natural human pose. Furthermore, we apply progressive attention masking to precisely estimate local interactions between vertices even under severe occlusions. Experimental results on benchmark datasets show that the proposed method efficiently improves the performance of 3D human mesh reconstruction. The code and model are publicly available at: https://github.com/DCVL-3D/PointHMR_release.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Results from the Paper


Ranked #19 on Monocular 3D Human Pose Estimation on Human3.6M (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Human Pose Estimation 3DPW PointHMR PA-MPJPE 44.9 # 33
MPJPE 73.9 # 32
MPVPE 85.5 # 26
Monocular 3D Human Pose Estimation Human3.6M PointHMR Average MPJPE (mm) 48.3 # 19
PA-MPJPE 32.9 # 3
3D Human Pose Estimation Human3.6M PointHMR Average MPJPE (mm) 48.3 # 140
Multi-View or Monocular Monocular # 1
PA-MPJPE 32.9 # 16

Methods


No methods listed for this paper. Add relevant methods here