HEMlets Pose: Learning Part-Centric Heatmap Triplets for Accurate 3D Human Pose Estimation

ICCV 2019  ยท  Kun Zhou, Xiaoguang Han, Nianjuan Jiang, Kui Jia, Jiangbo Lu ยท

Estimating 3D human pose from a single image is a challenging task. This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state - Part-Centric Heatmap Triplets (HEMlets), which shortens the gap between the 2D observation and the 3D interpretation. The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part. In our approach, a Convolutional Network (ConvNet) is first trained to predict HEMlests from the input image, followed by a volumetric joint-heatmap regression. We leverage on the integral operation to extract the joint locations from the volumetric heatmaps, guaranteeing end-to-end learning. Despite the simplicity of the network design, the quantitative comparisons show a significant performance improvement over the best-of-grade method (by 20% on Human3.6M). The proposed method naturally supports training with "in-the-wild" images, where only weakly-annotated relative depth information of skeletal joints is available. This further improves the generalization ability of our model, as validated by qualitative comparisons on outdoor images.

PDF Abstract ICCV 2019 PDF ICCV 2019 Abstract

Results from the Paper


 Ranked #1 on Monocular 3D Human Pose Estimation on Human3.6M (Use Video Sequence metric, using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Monocular 3D Human Pose Estimation Human3.6M HEMlets Pose (H36M+MPII) Average MPJPE (mm) 39.9 # 7
Frames Needed 1 # 1
PA-MPJPE 27.9 # 1
3D Human Pose Estimation Human3.6M HEMlets Pose (H36M+MPII) Average MPJPE (mm) 39.9 # 71
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
PA-MPJPE 27.9 # 3
Monocular 3D Human Pose Estimation Human3.6M HEMlets Pose Use Video Sequence No # 1
Frames Needed 1 # 1
Need Ground Truth 2D Pose No # 1
3D Human Pose Estimation Human3.6M HEMlets Pose Average MPJPE (mm) 45.1 # 110
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation HumanEva-I HEMlets Pose Mean Reconstruction Error (mm) 15.2 # 5
3D Human Pose Estimation MPI-INF-3DHP HEMlets Pose AUC 38 # 69
PCK 75.3 # 75

Methods