Action Recognition

881 papers with code • 49 benchmarks • 105 datasets

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Libraries

Use these libraries to find Action Recognition models and implementations
20 papers
3,892
10 papers
2,991
4 papers
550
See all 8 libraries.

DeGCN: Deformable Graph Convolutional Networks for Skeleton-Based Action Recognition

WoominM/DeGCN_pytorch IEEE Transactions on Image Processing 2024

Graph convolutional networks (GCN) have recently been studied to exploit the graph topology of the human body for skeleton-based action recognition.

4
25 Mar 2024

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

opengvlab/internvideo 22 Mar 2024

We introduce InternVideo2, a new video foundation model (ViFM) that achieves the state-of-the-art performance in action recognition, video-text tasks, and video-centric dialogue.

921
22 Mar 2024

GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition

deepintostreams/gcn-devlstm 22 Mar 2024

Skeleton-based action recognition (SAR) in videos is an important but challenging task in computer vision.

3
22 Mar 2024

vid-TLDR: Training Free Token merging for Light-weight Video Transformer

mlvlab/vid-tldr 20 Mar 2024

To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training.

16
20 Mar 2024

A Lie Group Approach to Riemannian Batch Normalization

gitzh-chen/liebn 17 Mar 2024

Using the deformation concept, we generalize the existing Lie groups on SPD manifolds into three families of parameterized Lie groups.

6
17 Mar 2024

Skeleton-Based Human Action Recognition with Noisy Labels

xuyizdby/noiseerasar 15 Mar 2024

In this study, we bridge this gap by implementing a framework that augments well-established skeleton-based human action recognition methods with label-denoising strategies from various research areas to serve as the initial benchmark.

2
15 Mar 2024

SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

KAIST-VICLab/SkateFormer 14 Mar 2024

We categorize the key skeletal-temporal relations for action recognition into a total of four distinct types.

13
14 Mar 2024

EventRPG: Event Data Augmentation with Relevance Propagation Guidance

myuansun/eventrpg 14 Mar 2024

Based on this, we propose EventRPG, which leverages relevance propagation on the spiking neural network for more efficient augmentation.

6
14 Mar 2024

On the Utility of 3D Hand Poses for Action Recognition

s-shamil/HandFormer 14 Mar 2024

3D hand poses are an under-explored modality for action recognition.

1
14 Mar 2024

Real-Time Multimodal Cognitive Assistant for Emergency Medical Services

uva-dsa/ems-pipeline 11 Mar 2024

Emergency Medical Services (EMS) responders often operate under time-sensitive conditions, facing cognitive overload and inherent risks, requiring essential skills in critical thinking and rapid decision-making.

13
11 Mar 2024