Action Recognition In Videos

64 papers with code • 17 benchmarks • 17 datasets

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Libraries

Use these libraries to find Action Recognition In Videos models and implementations
4 papers
3,866
3 papers
548
2 papers
2,972
See all 5 libraries.

ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

faceonlive/ai-research 9 Apr 2024

Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples.

131
09 Apr 2024

HaltingVT: Adaptive Token Halting Transformer for Efficient Video Recognition

dun-research/haltingvt 10 Jan 2024

Action recognition in videos poses a challenge due to its high computational cost, especially for Joint Space-Time video transformers (Joint VT).

4
10 Jan 2024

CAST: Cross-Attention in Space and Time for Video Action Recognition

khu-vll/cast NeurIPS 2023

In this work, we propose a novel two-stream architecture, called Cross-Attention in Space and Time (CAST), that achieves a balanced spatio-temporal understanding of videos using only RGB input.

25
30 Nov 2023

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

mondalanindya/msqnet 20 Jul 2023

Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors.

16
20 Jul 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

691
01 Jun 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

OpenGVLab/VideoMAEv2 CVPR 2023

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).

394
29 Mar 2023

Dual-path Adaptation from Image to Video Transformers

park-jungin/dualpath CVPR 2023

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.

39
17 Mar 2023

Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer

leonlha/Video-Action-Recognition-via-PSO-ConvNet-Transformer-Collaborative-Learning-with-Dynamics 17 Feb 2023

To extend our approach to video, we integrate ConvNets with state-of-the-art temporal methods such as Transformer and Recurrent Neural Networks.

3
17 Feb 2023

Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning

daniel-code/TubeViT CVPR 2023

We present a simple approach which can turn a ViT encoder into an efficient video model, which can seamlessly work with both image and video inputs.

74
06 Dec 2022

Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos

bhi-research/ava_mdetr 21 Sep 2022

We show that it is possible to use a multi-modal model to tackle a task that it was not designed for.

1
21 Sep 2022