Self-Attentive 3D Human Pose and Shape Estimation from Videos

26 Mar 2021  ·  Yun-Chun Chen, Marco Piccirilli, Robinson Piramuthu, Ming-Hsuan Yang ·

We consider the task of estimating 3D human pose and shape from videos. While existing frame-based approaches have made significant progress, these methods are independently applied to each image, thereby often leading to inconsistent predictions. In this work, we present a video-based learning algorithm for 3D human pose and shape estimation. The key insights of our method are two-fold. First, to address the inconsistent temporal prediction issue, we exploit temporal information in videos and propose a self-attention module that jointly considers short-range and long-range dependencies across frames, resulting in temporally coherent estimations. Second, we model human motion with a forecasting module that allows the transition between adjacent frames to be smooth. We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets. Extensive experimental results show that our algorithm performs favorably against the state-of-the-art methods.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
3D Human Pose Estimation 3DPW Self-Attentive PA-MPJPE 50.4 # 55
MPJPE 85.8 # 77
MPVPE 100.6 # 55
Acceleration Error 77.9 # 23
3D Human Pose Estimation Human3.6M Self-Attentive Average MPJPE (mm) 58.9 # 241
PA-MPJPE 38.7 # 43
3D Human Pose Estimation MPI-INF-3DHP Self-Attentive MPJPE 94.3 # 53
PA-MPJPE 60.7 # 5
PCK 90.1 # 26

Methods


No methods listed for this paper. Add relevant methods here