Spatial Temporal Transformer Network for Skeleton-based Action Recognition

Skeleton-based Human Activity Recognition has achieved a great interest in recent years, as skeleton data has been demonstrated to be robust to illumination changes, body scales, dynamic camera views and complex background. In particular, Spatial-Temporal Graph Convolutional Networks (ST-GCN) demonstrated to be effective in learning both spatial and temporal dependencies on non-Euclidean data such as skeleton graphs... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Skeleton Based Action Recognition Kinetics-Skeleton dataset ST-TR-agcn Accuracy 37.4 # 5
Skeleton Based Action Recognition NTU RGB+D ST-TR Accuracy (CV) 96.1 # 7
Accuracy (CS) 89.9 # 11
Skeleton Based Action Recognition NTU RGB+D 120 ST-TR-agcn Accuracy (Cross-Subject) 82.7% # 9
Accuracy (Cross-Setup) 84.7% # 9

Methods used in the Paper