Action Recognition

881 papers with code • 49 benchmarks • 105 datasets

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Recognition

Dataset	Best Model	Compare
Something-Something V2	InternVideo2-6B	See all
UCF101	VideoMAE V2-g	See all
HMDB-51	VideoMAE V2-g	See all
Something-Something V1	InternVideo	See all
AVA v2.2	LART (Hiera-H, K700 PT+FT)	See all
EPIC-KITCHENS-100	Avion (ViT-L)	See all
NTU RGB+D	PoseC3D (RGB + Pose)	See all
NTU RGB+D 120	PoseC3D (RGB + Pose)	See all
Diving-48	Video-FocalNet-B	See all
ActivityNet	Text4Vis (w/ ViT-L)	See all
AVA v2.1	STAR/L	See all
THUMOS’14	BMN	See all
Sports-1M	ip-CSN-152 (RGB)	See all
HACS	InternVideo2-6B	See all
Charades-Ego	LaViLa (Finetuned, TimeSformer-L)	See all
HAA500	TSN	See all
BAR	DebiAN	See all
UAV-Human	PMI Sampler	See all
Volleyball	PoseC3D (Pose Only)	See all
Real Life Violence Situations Dataset	DeVTr	See all
RareAct	🦩 Flamingo	See all
Jester (Gesture Recognition)	DirecFormer	See all
miniSports	IF+MD+RGB-R (ResNet-18)	See all
IRD	OHA-GCN (Two stream; HP + OHP-hands + informative samples)	See all
ICVL-4	OHA-GCN (Two stream; HP + OHP-hands + informative samples)	See all
UCF-101	DMC-Net (ResNet-18)	See all
Mimetics	JMRN	See all
Drone-Action	FAR	See all
Okutama-Action	PLAR with bbox (Ours)	See all
Animal Kingdom	MSQNet	See all
Charades	MSQNet	See all
VIRAT Ground 2.0	DHCM	See all
ActionNet-VE	Baseline	See all
UTD-MHAD	Action Machine (RGB only)	See all
EgoGesture	TSM+W3	See all
EPIC-KITCHENS-55	TSM+W3 - full res	See all
HMDB51	MSQNet	See all
MECCANO	SlowFast	See all
Win-Fail Action Understanding	2DCNN+TRN	See all
MTL-AQA	C3D-AVG	See all
UCF 101	R2+1D-BERT	See all
Penn Action	STAR-Transformer (RGB + Pose)	See all
Skeleton-Mimetics	Structured Keypoint Pooling	See all
RoCoG-v2	AZTR (Ours)	See all
NEC Drone	FAR	See all
UAV Human	FAR	See all
THUMOS14	MSQNet	See all
Hockey	MSQNet	See all
N-UCLA	DVANet	See all

Show all 49 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Recognition models and implementations

open-mmlab/mmaction2

20 papers

3,892

towhee-io/towhee

10 papers

2,991

yjxiong/caffe

4 papers

550

rwightman/pytorch-image-models

3 papers

29,758

See all 8 libraries.

Datasets

Subtasks

Few Shot Action Recognition

Fine-grained Action Recognition

Action Triplet Recognition

Open Set Action Recognition

Micro-Action Recognition

Weakly-Supervised Action Recognition

Atomic action recognition

Animal Action Recognition

Transportation Mode Detection

Open Vocabulary Action Recognition

Action Recognition In Still Images

Latest papers

Most implemented Social Latest No code

DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition

WoominM/DeGCN_pytorch • • IEEE Transactions on Image Processing 2024

Graph convolutional networks (GCN) have recently been studied to exploit the graph topology of the human body for skeleton-based action recognition.

25 Mar 2024

Paper
Code

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo • • 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

921

22 Mar 2024

Paper
Code

GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition

deepintostreams/gcn-devlstm • • 22 Mar 2024

Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision.

22 Mar 2024

Paper
Code

vid-TLDR: Training Free Token merging for Light-weight Video Transformer

mlvlab/vid-tldr • • 20 Mar 2024

To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training.

20 Mar 2024

Paper
Code

A Lie Group Approach to Riemannian Batch Normalization

gitzh-chen/liebn • • 17 Mar 2024

Using the deformation concept, we generalize the existing Lie groups on SPD manifolds into three families of parameterized Lie groups.

17 Mar 2024

Paper
Code

Skeleton-Based Human Action Recognition with Noisy Labels

xuyizdby/noiseerasar • 15 Mar 2024

In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark.

15 Mar 2024

Paper
Code

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

KAIST-VICLab/SkateFormer • • 14 Mar 2024

We categorize the key skeletal-temporal relations for action recognition into a total of four distinct types.

14 Mar 2024

Paper
Code

EventRPG: Event Data Augmentation with Relevance Propagation Guidance

myuansun/eventrpg • • 14 Mar 2024

Based on this, we propose EventRPG, which leverages relevance propagation on the spiking neural network for more efficient augmentation.

14 Mar 2024

Paper
Code

On the Utility of 3D Hand Poses for Action Recognition

s-shamil/HandFormer • 14 Mar 2024

3D hand poses are an under-explored modality for action recognition.

14 Mar 2024

Paper
Code

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

uva-dsa/ems-pipeline • • 11 Mar 2024

Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making.

11 Mar 2024

Paper
Code

Action Recognition

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result