Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks

27 Sep 2021  ·  Michael Duhme, Raphael Memmesheimer, Dietrich Paulus ·

In this paper, we present Fusion-GCN, an approach for multimodal action recognition using Graph Convolutional Networks (GCNs). Action recognition methods based around GCNs recently yielded state-of-the-art performance for skeleton-based action recognition. With Fusion-GCN, we propose to integrate various sensor data modalities into a graph that is trained using a GCN model for multi-modal action recognition. Additional sensor measurements are incorporated into the graph representation, either on a channel dimension (introducing additional node attributes) or spatial dimension (introducing new nodes). Fusion-GCN was evaluated on two public available datasets, the UTD-MHAD- and MMACT datasets, and demonstrates flexible fusion of RGB sequences, inertial measurements and skeleton sequences. Our approach gets comparable results on the UTD-MHAD dataset and improves the baseline on the large-scale MMACT dataset by a significant margin of up to 12.37% (F1-Measure) with the fusion of skeleton estimates and accelerometer measurements.

PDF Abstract

Datasets


Results from the Paper


Ranked #3 on Multimodal Activity Recognition on MMAct (F1-Score (Cross-Subject) metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Multimodal Activity Recognition MMAct Fusion-GCN F1-Score (Cross-Subject) 89.60 # 3

Methods