Paper tables with annotated results for 3D Pose Detection in Videos: Focusing on Occlusion

Paper

3D Pose Detection in Videos: Focusing on Occlusion

In this work, we build upon existing methods for occlusion-aware 3D pose detection in videos. We implement a two stage architecture that consists of the stacked hourglass network to produce 2D pose predictions, which are then inputted into a temporal convolutional network to produce 3D pose predictions. To facilitate prediction on poses with occluded joints, we introduce an intuitive generalization of the cylinder man model used to generate occlusion labels. We find that the occlusion-aware network is able to achieve a mean-per-joint-position error 5 mm less than our linear baseline model on the Human3.6M dataset. Compared to our temporal convolutional network baseline, we achieve a comparable mean-per-joint-position error of 0.1 mm less at reduced computational cost.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

3D Pose Detection in Videos: Focusing on Occlusion

Reader Guidelines

Editor Guidelines