Multi-GAT: A Graphical Attention-based Hierarchical Multimodal Representation Learning Approach for Human Activity Recognition

Recognizing human activities is one of the crucial capabilities that a robot needs to have to be useful around people. Although modern robots are equipped with various types of sensors, human activity recognition (HAR) still remains a challenging problem, particularly in the presence of noisy sensor data. In this work, we introduce a multimodal graphical attention-based HAR approach, called Multi-GAT, which hierarchically learns complementary multimodal features. We develop a multimodal mixture-of-experts model to disentangle and extract salient modality-specific features that enable feature interactions. Additionally, we introduce a novel message-passing based graphical attention approach to capture cross-modal relation for extracting complementary multimodal features. The experimental results on two multimodal human activity datasets suggest that Multi-GAT outperformed state-of-the-art HAR algorithms across all datasets and metrics tested. Finally, the experimental results with noisy sensor data indicate that Multi-GAT consistently outperforms all the evaluated baselines. The robust performance suggests that Multi-GAT can enable seamless human-robot collaboration in noisy human environments.

PDF

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Multimodal Activity Recognition MMAct Multi-GAT F1-Score (Cross-Subject) 75.24 # 1
F1-Score (Cross-Session) 91.48 # 2

Methods


No methods listed for this paper. Add relevant methods here