Modulated Graph Convolutional Network for 3D Human Pose Estimation

ICCV 2021  ·  Zhiming Zou, Wei Tang ·

The graph convolutional network (GCN) has recently achieved promising performance of 3D human pose estimation (HPE) by modeling the relationship among body parts. However, most prior GCN approaches suffer from two main drawbacks. First, they share a feature transformation for each node within a graph convolution layer. This prevents them from learning different relations between different body joints. Second, the graph is usually defined according to the human skeleton and is suboptimal because human activities often exhibit motion patterns beyond the natural connections of body joints. To address these limitations, we introduce a novel Modulated GCN for 3D HPE. It consists of two main components: weight modulation and affinity modulation. Weight modulation learns different modulation vectors for different nodes so that the feature transformations of different nodes are disentangled while retaining a small model size. Affinity modulation adjusts the graph structure in a GCN so that it can model additional edges beyond the human skeleton. We investigate several affinity modulation methods as well as the impact of regularizations. Rigorous ablation study indicates both types of modulation improve performance with negligible overhead. Compared with state-of-the-art GCNs for 3D HPE, our approach either significantly reduces the estimation errors, e.g., by around 10%, while retaining a small model size or drastically reduces the model size, e.g., from 4.22M to 0.29M (a 14.5 X reduction), while achieving comparable performance. Results on two benchmarks show our Modulated GCN outperforms some recent states of the art. Our code is available at https://github.com/ZhimingZo/Modulated-GCN.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
3D Human Pose Estimation Human3.6M Modulated-GCN Average MPJPE (mm) 49.4 # 154
Using 2D ground-truth joints No # 2
Multi-View or Monocular Monocular # 1
3D Human Pose Estimation MPI-INF-3DHP Modulated-GCN AUC 53.7 # 39
PCK 86.1 # 42

Methods