Action Recognition

883 papers with code • 49 benchmarks • 105 datasets

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Recognition

Dataset	Best Model	Compare
Something-Something V2	InternVideo2-6B	See all
UCF101	VideoMAE V2-g	See all
HMDB-51	VideoMAE V2-g	See all
Something-Something V1	InternVideo	See all
AVA v2.2	LART (Hiera-H, K700 PT+FT)	See all
EPIC-KITCHENS-100	Avion (ViT-L)	See all
NTU RGB+D	PoseC3D (RGB + Pose)	See all
NTU RGB+D 120	PoseC3D (RGB + Pose)	See all
Diving-48	Video-FocalNet-B	See all
ActivityNet	Text4Vis (w/ ViT-L)	See all
AVA v2.1	STAR/L	See all
THUMOS’14	BMN	See all
Sports-1M	ip-CSN-152 (RGB)	See all
HACS	InternVideo2-6B	See all
Charades-Ego	LaViLa (Finetuned, TimeSformer-L)	See all
HAA500	TSN	See all
BAR	DebiAN	See all
UAV-Human	PMI Sampler	See all
Volleyball	PoseC3D (Pose Only)	See all
Real Life Violence Situations Dataset	DeVTr	See all
RareAct	🦩 Flamingo	See all
Jester (Gesture Recognition)	DirecFormer	See all
miniSports	IF+MD+RGB-R (ResNet-18)	See all
IRD	OHA-GCN (Two stream; HP + OHP-hands + informative samples)	See all
ICVL-4	OHA-GCN (Two stream; HP + OHP-hands + informative samples)	See all
UCF-101	DMC-Net (ResNet-18)	See all
Mimetics	JMRN	See all
Drone-Action	FAR	See all
Okutama-Action	PLAR with bbox (Ours)	See all
Animal Kingdom	MSQNet	See all
Charades	MSQNet	See all
VIRAT Ground 2.0	DHCM	See all
ActionNet-VE	Baseline	See all
UTD-MHAD	Action Machine (RGB only)	See all
EgoGesture	TSM+W3	See all
EPIC-KITCHENS-55	TSM+W3 - full res	See all
HMDB51	MSQNet	See all
MECCANO	SlowFast	See all
Win-Fail Action Understanding	2DCNN+TRN	See all
MTL-AQA	C3D-AVG	See all
UCF 101	R2+1D-BERT	See all
Penn Action	STAR-Transformer (RGB + Pose)	See all
Skeleton-Mimetics	Structured Keypoint Pooling	See all
RoCoG-v2	AZTR (Ours)	See all
NEC Drone	FAR	See all
UAV Human	FAR	See all
THUMOS14	MSQNet	See all
Hockey	MSQNet	See all
N-UCLA	DVANet	See all

Show all 49 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Recognition models and implementations

open-mmlab/mmaction2

20 papers

3,911

towhee-io/towhee

10 papers

3,003

yjxiong/caffe

4 papers

550

rwightman/pytorch-image-models

3 papers

29,867

See all 8 libraries.

Datasets

Subtasks

Few Shot Action Recognition

Fine-grained Action Recognition

Action Triplet Recognition

Open Set Action Recognition

Micro-Action Recognition

Weakly-Supervised Action Recognition

Atomic action recognition

Animal Action Recognition

Transportation Mode Detection

Open Vocabulary Action Recognition

Action Recognition In Still Images

Latest papers

Most implemented Social Latest No code

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

uva-dsa/ems-pipeline • • 11 Mar 2024

Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making.

11 Mar 2024

Paper
Code

Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling

wgcban/apt • • 11 Mar 2024

This approach greatly reduces the number of learnable parameters compared to full tuning.

11 Mar 2024

Paper
Code

Benchmarking Micro-action Recognition: Dataset, Methods, and Applications

vut-hfut/micro-action • • 8 Mar 2024

It offers insights into the feelings and intentions of individuals and is important for human-oriented applications such as emotion recognition and psychological assessment.

08 Mar 2024

Paper
Code

Video Relationship Detection Using Mixture of Experts

shibshib/Moe-VRD • • IEEE Access 2023

Secondly, classifiers trained by a single, monolithic neural network often lack stability and generalization.

06 Mar 2024

Paper
Code

Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition

kunyulin/xov-action • 3 Mar 2024

To answer this, we establish a CROSS-domain Open-Vocabulary Action recognition benchmark named XOV-Action, and conduct a comprehensive evaluation of five state-of-the-art CLIP-based video learners under various types of domain gaps.

03 Mar 2024

Paper
Code

Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data

jacklishufan/mamba-nd • • 8 Feb 2024

A recent architecture, Mamba, based on state space models has been shown to achieve comparable performance for modeling text sequences, while scaling linearly with the sequence length.

08 Feb 2024

Paper
Code

Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping

Trustworthy-AI-Group/TransferAttack • • 6 Feb 2024

Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems.

141

06 Feb 2024

Paper
Code

Taylor Videos for Action Recognition

leiwangr/video-ar • 5 Feb 2024

Addressing these challenges, we propose the Taylor video, a new video format that highlights the dominate motions (e. g., a waving hand) in each of its frames named the Taylor frame.

05 Feb 2024

Paper
Code

AutoGCN -- Towards Generic Human Activity Recognition with Neural Architecture Search

deepinmotion/autogcn • • 2 Feb 2024

This paper introduces AutoGCN, a generic Neural Architecture Search (NAS) algorithm for Human Activity Recognition (HAR) using Graph Convolution Networks (GCNs).

02 Feb 2024

Paper
Code

Image-based human re-identification: Which covariates are actually (the most) important?

KaiyangZhou/deep-person-reid • • Image and Vision Computing 2024

Human re-identification (re-ID) is nowadays among the most popular topics in computer vision, due to the increasing importance given to safety/security in modern societies.

4,115

20 Jan 2024

Paper
Code

Action Recognition

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result