Action Recognition
866 papers with code • 49 benchmarks • 104 datasets
Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.
In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.
Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.
Libraries
Use these libraries to find Action Recognition models and implementationsDatasets
Subtasks
- Action Recognition In Videos
- Self-Supervised Action Recognition
- 3D Action Recognition
- Few Shot Action Recognition
- Few Shot Action Recognition
- Fine-grained Action Recognition
- Action Triplet Recognition
- Open Set Action Recognition
- Weakly-Supervised Action Recognition
- Atomic action recognition
- Animal Action Recognition
- Transportation Mode Detection
- Open Vocabulary Action Recognition
- Action Recognition In Still Images
Latest papers with no code
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
We interact with the world with our hands and see it through our own (egocentric) perspective.
Emotion Recognition from the perspective of Activity Recognition
In this paper, we treat emotion recognition from the perspective of action recognition by exploring the application of deep learning architectures specifically designed for action recognition, for continuous affect recognition.
Hierarchical NeuroSymbolic Approach for Action Quality Assessment
Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action.
Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Global obfuscation hides privacy sensitive regions, but also contextual regions important for action recognition.
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Then, we propose a conceptual reasoning-based uncertainty estimation module, which simulates the recognition process to enrich the semantic representation.
VideoBadminton: A Video Dataset for Badminton Action Recognition
In this paper, we introduce the VideoBadminton dataset derived from high-quality badminton footage.
Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception
In this paper, we propose a novel learning framework, where the weak labels are first used to train a multi-view video-based base model, which is subsequently used for downstream frame-level perception tasks.
A Survey of IMU Based Cross-Modal Transfer Learning in Human Activity Recognition
We also distinguish and expound on many related but inconsistently used terms in the literature, such as transfer learning, domain adaptation, representation learning, sensor fusion, and multimodal learning, and describe how cross-modal learning fits with all these concepts.
CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
Most existing one-shot skeleton-based action recognition focuses on raw low-level information (e. g., joint location), and may suffer from local information loss and low generalization ability.
Skeleton-Based Human Action Recognition with Noisy Labels
In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark.