Action Recognition In Videos

64 papers with code • 17 benchmarks • 17 datasets

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Libraries

Use these libraries to find Action Recognition In Videos models and implementations
4 papers
3,908
3 papers
550
2 papers
3,001
See all 5 libraries.

Most implemented papers

Busy-Quiet Video Disentangling for Video Classification

guoxih/Busy-Quiet-Video-Disentangling-for-Video-Classification 29 Mar 2021

We design a trainable Motion Band-Pass Module (MBPM) for separating busy information from quiet information in raw video data.

VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text

google-research/google-research NeurIPS 2021

We train VATT end-to-end from scratch using multimodal contrastive losses and evaluate its performance by the downstream tasks of video action recognition, audio event classification, image classification, and text-to-video retrieval.

ActionCLIP: A New Paradigm for Video Action Recognition

sallymmx/actionclip 17 Sep 2021

Moreover, to handle the deficiency of label texts and make use of tremendous web data, we propose a new paradigm based on this multimodal learning framework for action recognition, which we dub "pre-train, prompt and fine-tune".

Video Action Recognition Collaborative Learning with Dynamics via PSO-ConvNet Transformer

leonlha/video-action-recognition-collaborative-learning-with-dynamics-via-pso-convnet-transformer 17 Feb 2023

To extend our approach to video, we integrate ConvNets with state-of-the-art temporal methods such as Transformer and Recurrent Neural Networks.

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

rana2149/actnetformer 9 Apr 2024

Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples.

Convolutional Two-Stream Network Fusion for Video Action Recognition

feichtenhofer/twostreamfusion CVPR 2016

Recent applications of Convolutional Neural Networks (ConvNets) for human action recognition in videos have proposed different solutions for incorporating the appearance and motion information.

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

piergiaj/latent-subevents 26 May 2016

In this paper, we newly introduce the concept of temporal attention filters, and describe how they can be used for human activity recognition from videos.

Spatiotemporal Residual Networks for Video Action Recognition

feichtenhofer/st-resnet NeurIPS 2016

Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos.

Two-stream Flow-guided Convolutional Attention Networks for Action Recognition

antran89/two-stream-fcan 30 Aug 2017

This paper proposes a two-stream flow-guided convolutional attention networks for action recognition in videos.