Action Recognition
883 papers with code • 49 benchmarks • 105 datasets
Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.
In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.
Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.
Libraries
Use these libraries to find Action Recognition models and implementationsDatasets
Subtasks
- Action Recognition In Videos
- 3D Action Recognition
- Self-Supervised Action Recognition
- Few Shot Action Recognition
- Few Shot Action Recognition
- Fine-grained Action Recognition
- Action Triplet Recognition
- Open Set Action Recognition
- Micro-Action Recognition
- Weakly-Supervised Action Recognition
- Atomic action recognition
- Animal Action Recognition
- Transportation Mode Detection
- Open Vocabulary Action Recognition
- Action Recognition In Still Images
Latest papers
Real-Time Multimodal Cognitive Assistant for Emergency Medical Services
Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making.
Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling
This approach greatly reduces the number of learnable parameters compared to full tuning.
Benchmarking Micro-action Recognition: Dataset, Methods, and Applications
It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment.
Video Relationship Detection Using Mixture of Experts
Secondly, classifiers trained by a single, monolithic neural network often lack stability and generalization.
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps.
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
A recent architecture, Mamba, based on state space models has been shown to achieve comparable performance for modeling text sequences, while scaling linearly with the sequence length.
Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems.
Taylor Videos for Action Recognition
Addressing these challenges, we propose the Taylor video, a new video format that highlights the dominate motions (e. g., a waving hand) in each of its frames named the Taylor frame.
AutoGCN -- Towards Generic Human Activity Recognition with Neural Architecture Search
This paper introduces AutoGCN, a generic Neural Architecture Search (NAS) algorithm for Human Activity Recognition (HAR) using Graph Convolution Networks (GCNs).
Image-based human re-identification: Which covariates are actually (the most) important?
Human re-identification (re-ID) is nowadays among the most popular topics in computer vision, due to the increasing importance given to safety/security in modern societies.