Action Recognition In Videos

64 papers with code • 17 benchmarks • 17 datasets

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Recognition In Videos

Dataset	Best Model	Compare
Jester (Gesture Recognition)	CPNet Res34, 5 CP	See all
UCF101	STM (ImageNet+Kinetics pretrain)	See all
Something-Something V2	CAST-B/16	See all
Something-Something V1	STM (16 frames, ImageNet pretraining)	See all
Kinetics-400	CAST-B/16	See all
PKU-MMD	MMNet	See all
Sports-1M	G-Blend	See all
FS-Something-Something V2-Small	ITANet	See all
FS-Something-Something V2-Full	ITANet	See all
THUMOS’14	Single-stream R-C3D (two-way buffer)	See all
AVA v2.2	YOWO+LFB*	See all
HMDB-51	STM (ImageNet+Kinetics pretrain)	See all
AVA v2.1	YOWO+LFB*	See all
Kinetics-600	Florence	See all
ActivityNet	LSTM + Pretrained on YT-8M	See all
NTU RGB+D	2D-3D-Softargmax (RGB only)	See all
miniSports	G-Blend	See all

Show all 17 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Recognition In Videos models and implementations

open-mmlab/mmaction2

4 papers

3,904

yjxiong/caffe

3 papers

550

towhee-io/towhee

2 papers

3,000

MichiganCOG/M-PACT

2 papers

106

See all 5 libraries.

Datasets

Subtasks

Action Anticipation

Latest papers

Most implemented Social Latest No code

MMNet: A Model-Based Multimodal Network for Human Action Recognition in RGB-D Videos

bruceyo/MMNet • • IEEE Transactions on Pattern Analysis and Machine Intelligence 2022

Upon aggregating the results of multiple modalities, our method is found to outperform state-of-the-art approaches on six evaluation protocols of the five datasets; thus, the proposed MMNet can effectively capture mutually complementary features in different RGB-D video modalities and provide more discriminative features for HAR.

26 May 2022

Paper
Code

DirecFormer: A Directed Attention in Transformer Approach to Robust Action Recognition

uark-cviu/direcformer • • CVPR 2022

Various 3D-CNN based methods have been presented to tackle both the spatial and temporal dimensions in the task of video action recognition with competitive results.

19 Mar 2022

Paper
Code

Self-supervised Video Transformer

kahnchana/svt • • CVPR 2022

To the best of our knowledge, the proposed approach is the first to alleviate the dependency on negative samples or dedicated memory banks in Self-supervised Video Transformer (SVT).

02 Dec 2021

Paper
Code

Florence: A New Foundation Model for Computer Vision

microsoft/unicl • • 22 Nov 2021

Computer vision foundation models, which are trained on diverse, large-scale dataset and can be adapted to a wide range of downstream tasks, are critical for this mission to solve real-world computer vision applications.

369

22 Nov 2021

Paper
Code

Logsig-RNN: a novel network for robust and efficient skeleton-based action recognition

steveliao93/gcn_logsigrnn • • 25 Oct 2021

In this paper, we propose a novel module, namely Logsig-RNN, which is the combination of the log-signature layer and recurrent type neural networks (RNNs).

25 Oct 2021

Paper
Code

ActionCLIP: A New Paradigm for Video Action Recognition

towhee-io/towhee • • 17 Sep 2021

Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".

3,000

17 Sep 2021

Paper
Code

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

martinetoering/ViCC • • 18 Jun 2021

Instance-level contrastive learning techniques, which rely on data augmentation and a contrastive loss function, have found great success in the domain of visual representation learning.

18 Jun 2021

Paper
Code

Space-time Mixing Attention for Video Transformer

1adrianb/video-transformers • • NeurIPS 2021

In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces no overhead compared to an image-based Transformer model.

10 Jun 2021

Paper
Code

Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition

bruceyo/TSMF • • Association for the Advancement of Artificial Intelligence (AAAI) 2021

In our TSMF, we utilize a teacher network to transfer the structural knowledge of the skeleton modality to a student network for the RGB modality.

18 May 2021

Paper
Code

Learning Implicit Temporal Alignment for Few-shot Video Classification

tonysy/PyAction • • 11 May 2021

Few-shot video classification aims to learn new video categories with only a few labeled examples, alleviating the burden of costly annotation in real-world applications.

11 May 2021

Paper
Code

Action Recognition In Videos

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result