Egocentric Activity Recognition
14 papers with code • 2 benchmarks • 4 datasets
Libraries
Use these libraries to find Egocentric Activity Recognition models and implementationsLatest papers
WEAR: An Outdoor Sports Dataset for Wearable and Egocentric Activity Recognition
Though research has shown the complementarity of camera- and inertial-based data, datasets which offer both egocentric video and inertial-based sensor data remain scarce.
Towards Continual Egocentric Activity Recognition: A Multi-modal Egocentric Activity Dataset for Continual Learning
However, the deficiency of related dataset hinders the development of multi-modal deep learning for egocentric activity recognition.
Learning Video Representations from Large Language Models
We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).
Group Contextualization for Video Recognition
By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities.
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
We introduce an approach for pre-training egocentric video models using large-scale third-person video datasets.
Integrating Human Gaze into Attention for Egocentric Activity Recognition
In addition, we model the distribution of gaze fixations using a variational method.
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i. e. the combination of modalities within a range of temporal offsets.
What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention
Our method is ranked first in the public leaderboard of the EPIC-Kitchens egocentric action anticipation challenge 2019.
Large-scale weakly-supervised pre-training for video action recognition
Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?
Long-Term Feature Banks for Detailed Video Understanding
To understand the world, we humans constantly need to relate the present to the past, and put events in context.