Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos

13 Dec 2016 · Marcel Sheeny de Moraes, Sankha Mukherjee, Neil M. Robertson ·

Human interaction recognition is a challenging problem in computer vision and has been researched over the years due to its important applications. With the development of deep models for the human pose estimation problem, this work aims to verify the effectiveness of using the human pose in order to recognize the human interaction in monocular videos. This paper developed a method based on 5 steps: detect each person in the scene, track them, retrieve the human pose, extract features based on the pose and finally recognize the interaction using a classifier. The Two-Person interaction dataset was used for the development of this methodology. Using a whole sequence evaluation approach it achieved 87.56% of average accuracy of all interaction. Yun, et at achieved 91.10% using the same dataset, however their methodology used the depth sensor to recognize the interaction. The methodology developed in this paper shows that an RGB camera can be as effective as depth cameras to recognize the interaction between two persons using the recent development of deep models to estimate the human pose.

PDF Abstract