Image: Rahmani et al
Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems.
Our experiments show that (a) state-of-the-art 3D convolutional neural networks obtain disappointing results on such videos, highlighting the lack of true understanding of the human actions and (b) models leveraging body language via human pose are less prone to context biases.
In this work, we propose to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition.
Ranked #1 on Multimodal Activity Recognition on EV-Action
Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes.
Each available 3DV voxel intrinsically involves 3D spatial and motion feature jointly.
The proposed representation has the advantage of combining the use of reference joints and a tree structure skeleton.
Ranked #3 on Action Recognition on NTU RGB+D 120
Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community.
Ranked #4 on Action Recognition on NTU RGB+D 120
The proposed method achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action analysis.
Ranked #50 on Skeleton Based Action Recognition on NTU RGB+D