Multimodal Activity Recognition
12 papers with code • 10 benchmarks • 7 datasets
Latest papers
OPERAnet: A Multimodal Activity Recognition Dataset Acquired from Radio Frequency and Vision-based Sensors
This dataset can be exploited to advance WiFi and vision-based HAR, for example, using pattern recognition, skeletal representation, deep learning algorithms or other novel approaches to accurately recognize human activities.
Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks
In this paper, we present Fusion-GCN, an approach for multimodal action recognition using Graph Convolutional Networks (GCNs).
Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Having access to multi-modal cues (e. g. vision and audio) empowers some cognitive tasks to be done faster compared to learning from a single modality.
Gimme Signals: Discriminative signal encoding for multimodal activity recognition
We present a simple, yet effective and flexible method for action recognition supporting multiple sensor modalities.
Bayesian Hierarchical Dynamic Model for Human Action Recognition
Human action recognition remains as a challenging task partially due to the presence of large variations in the execution of action.
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
Learning to represent videos is a very challenging task both algorithmically and computationally.
EV-Action: Electromyography-Vision Multi-Modal Action Dataset
To make up this, we introduce a new, large-scale EV-Action dataset in this work, which consists of RGB, depth, electromyography (EMG), and two skeleton modalities.
Cross-modal Learning by Hallucinating Missing Modalities in RGB-D Vision
We report state-of-the-art or comparable results on video action recognition on the largest multimodal dataset available for this task, the NTU RGB+D, as well as on the UWA3DII and Northwestern-UCLA.
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Dynamics of human body skeletons convey significant information for human action recognition.
Moments in Time Dataset: one million videos for event understanding
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds.