Action Recognition

883 papers with code • 49 benchmarks • 105 datasets

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Libraries

Use these libraries to find Action Recognition models and implementations
20 papers
3,911
10 papers
3,003
4 papers
550
See all 8 libraries.

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

uva-dsa/ems-pipeline 11 Mar 2024

Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making.

14
11 Mar 2024

Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling

wgcban/apt 11 Mar 2024

This approach greatly reduces the number of learnable parameters compared to full tuning.

8
11 Mar 2024

Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

vut-hfut/micro-action 8 Mar 2024

It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment.

3
08 Mar 2024

Video Relationship Detection Using Mixture of Experts

shibshib/Moe-VRD IEEE Access 2023

Secondly, classifiers trained by a single, monolithic neural network often lack stability and generalization.

2
06 Mar 2024

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

kunyulin/xov-action 3 Mar 2024

To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps.

12
03 Mar 2024

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

jacklishufan/mamba-nd 8 Feb 2024

A recent architecture, Mamba, based on state space models has been shown to achieve comparable performance for modeling text sequences, while scaling linearly with the sequence length.

24
08 Feb 2024

Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping

Trustworthy-AI-Group/TransferAttack 6 Feb 2024

Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems.

141
06 Feb 2024

Taylor Videos for Action Recognition

leiwangr/video-ar 5 Feb 2024

Addressing these challenges, we propose the Taylor video, a new video format that highlights the dominate motions (e. g., a waving hand) in each of its frames named the Taylor frame.

6
05 Feb 2024

AutoGCN -- Towards Generic Human Activity Recognition with Neural Architecture Search

deepinmotion/autogcn 2 Feb 2024

This paper introduces AutoGCN, a generic Neural Architecture Search (NAS) algorithm for Human Activity Recognition (HAR) using Graph Convolution Networks (GCNs).

2
02 Feb 2024

Image-based human re-identification: Which covariates are actually (the most) important?

KaiyangZhou/deep-person-reid Image and Vision Computing 2024

Human re-identification (re-ID) is nowadays among the most popular topics in computer vision, due to the increasing importance given to safety/security in modern societies.

4,115
20 Jan 2024