Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D Human Pose Estimation

7 Apr 2020  ·  Zhe Wang, Daeyun Shin, Charless C. Fowlkes ·

Monocular estimation of 3d human pose has attracted increased attention with the availability of large ground-truth motion capture datasets. However, the diversity of training data available is limited and it is not clear to what extent methods generalize outside the specific datasets they are trained on. In this work we carry out a systematic study of the diversity and biases present in specific datasets and its effect on cross-dataset generalization across a compendium of 5 pose datasets. We specifically focus on systematic differences in the distribution of camera viewpoints relative to a body-centered coordinate frame. Based on this observation, we propose an auxiliary task of predicting the camera viewpoint in addition to pose. We find that models trained to jointly predict viewpoint and pose systematically show significantly improved cross-dataset generalization.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
3D Human Pose Estimation 3DPW Cross Dataset Generalization PA-MPJPE 65.2 # 107
MPJPE 89.7 # 89
3D Human Pose Estimation Geometric Pose Affordance Cross Dataset Generalization MPJPE 53.3 # 1
3D Human Pose Estimation Human3.6M Cross Dataset Generalization Average MPJPE (mm) 52 # 191
Using 2D ground-truth joints Yes # 2
Multi-View or Monocular Monocular # 1
PA-MPJPE 42.5 # 73
Monocular 3D Human Pose Estimation Human3.6M cross-dataset-evaluation Average MPJPE (mm) 52.0 # 23
Use Video Sequence No # 1
Frames Needed 1 # 1
Need Ground Truth 2D Pose No # 1
3D Human Pose Estimation MPI-INF-3DHP Cross Dataset Generalization MPJPE 90.3 # 43
PCK 84.3 # 46
3D Human Pose Estimation Surreal Cross Dataset Generalization MPJPE 37.1 # 2
PCK 97.3 # 1

Methods


No methods listed for this paper. Add relevant methods here