Dive Deeper Into Integral Pose Regression

ICLR 2022 · Kerui Gu, Linlin Yang, Angela Yao ·

Integral pose regression combines the use of an implicit heatmap with end-to-end training for human body and hand pose estimation. Unlike detection-based heatmap methods, which decode final joint positions from the heatmap with a non-differentiable argmax operation, integral regression methods apply a differentiable expectation operation. The differentiable decoding allows for end-to-end training directly from ground-truth coordinates, though this slows down the learning process. It also leads to curious differences in performance, \ie integral regression is competitive or better than detection-based methods on ``hard'' samples but performs worse on the ``easy'' samples. We do a deep dive on the inference and back-propagation of integral pose regression to better understand the causes behind the performance and training differences. For inference, we give theoretical support that expectation is better than argmax operation, but it only takes place in hard cases in practice as activation region of heatmaps shrink in easy cases. We then experimentally show that the shrinkage of activation regression is one of the main causes of its inferior performance. For back-propagation, we theoretically and empirically analyze the gradients to explain the slow training speeds. Our analysis based on expectation operation and backward propagation gives insights to understand IPR and we use experiments to demonstrate the capability of IPR to surpass the performance of detection.

PDF Abstract