View-Invariant Probabilistic Embedding for Human Pose

Depictions of similar human body configurations can vary with changing viewpoints. Using only 2D information, we would like to enable vision algorithms to recognize similarity in human body poses across multiple views. This ability is useful for analyzing body movements and human behaviors in images and videos. In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses. Since 2D poses are projected from 3D space, they have an inherent ambiguity, which is difficult to represent through a deterministic mapping. Hence, we use probabilistic embeddings to model this input uncertainty. Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views, in comparison with 2D-to-3D pose lifting models. We also demonstrate the effectiveness of applying our embeddings to view-invariant action recognition and video alignment. Our code is available at https://github.com/google-research/google-research/tree/master/poem.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Pose Retrieval Human3.6M Pr-VIPE Hit@1 76.2 # 1
Hit@10 95.6 # 1
Pose Retrieval MPI-INF-3DHP Pr-VIPE Hit@1 26.4 # 1
Hit@10 58.6 # 1
Video Alignment UPenn Action Pr-VIPE Kendall's Tau 0.7476 # 2
Skeleton Based Action Recognition UPenn Action Pr-VIPE Accuracy 97.5 # 2

Methods


No methods listed for this paper. Add relevant methods here