Human Activity Recognition: A Spatio-temporal Image Encoding of 3D Skeleton Data for Online Action Detection

Human activity recognition (HAR) based on skeleton data that can be extracted from videos (Kinect for example) , or provided by a depth camera is a time series classification problem, where handling both spatial and temporal dependencies is a crucial task, in order to achieve a good recognition. In the online human activity recognition, identifying the beginning and end of an action is an important element, that might be difficult in a continuous data flow. In this work, we present a 3D skeleton data encoding method to generate an image that preserves the spatial and temporal dependencies existing between the skeletal joints.To allow online action detection we combine this encoding system with a sliding window on the continous data stream. By this way, no start or stop timestamp is needed and the recognition can be done at any moment. A deep learning CNN algorithm is used to achieve actions online detection.

PDF

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Human Activity Recognition OAD dataset STIE + VGG16(fine-tuning) Accuracy 86.81 # 3

Methods