Enhanced Spatio- Temporal Image Encoding for Online Human Activity Recognition

Human Activity Recognition (HAR) based on sen-sors data can be seen as a time series classification problem where the challenge is to handle both spatial and temporal dependencies, while focusing on the most relevant data variations. It can be done using 3D skeleton data extracted from a RGB+D camera. In this work, we propose to improve the spatio-temporal image encoding of 3D skeletons captured from a Kinect sensor, by studying the concept of motion energy which focuses mainly on skeleton joints that are the most solicited for an action. This encoding allows us to achieve a better discrimination for the detection of online activities by focusing on the most significant parts of the actions. The article presents this new encoding and its application for HAR using a deep learning model trained on the encoded 3D skeleton data. For this purpose, we proposed to investigate the knowledge transferability of several pre-trained CNNs provided by Keras. The article shows a significant improvement of the accuracy of the learning according to the state of the art.

PDF

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Human Activity Recognition OAD dataset ESTIE + VGG16 (transfer-learning) Accuracy 95.22 # 1
Human Activity Recognition OAD dataset STIE + VGG16 (transfer-learning) Accuracy 94.77 # 2

Methods