Action Recognition In Videos

64 papers with code • 17 benchmarks • 17 datasets

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Recognition In Videos

Dataset	Best Model	Compare
Jester (Gesture Recognition)	CPNet Res34, 5 CP	See all
UCF101	STM (ImageNet+Kinetics pretrain)	See all
Something-Something V2	CAST-B/16	See all
Something-Something V1	STM (16 frames, ImageNet pretraining)	See all
Kinetics-400	CAST-B/16	See all
PKU-MMD	MMNet	See all
Sports-1M	G-Blend	See all
FS-Something-Something V2-Small	ITANet	See all
FS-Something-Something V2-Full	ITANet	See all
THUMOS’14	Single-stream R-C3D (two-way buffer)	See all
AVA v2.2	YOWO+LFB*	See all
HMDB-51	STM (ImageNet+Kinetics pretrain)	See all
AVA v2.1	YOWO+LFB*	See all
Kinetics-600	Florence	See all
ActivityNet	LSTM + Pretrained on YT-8M	See all
NTU RGB+D	2D-3D-Softargmax (RGB only)	See all
miniSports	G-Blend	See all

Show all 17 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Recognition In Videos models and implementations

open-mmlab/mmaction2

4 papers

3,904

yjxiong/caffe

3 papers

550

towhee-io/towhee

2 papers

3,000

MichiganCOG/M-PACT

2 papers

106

See all 5 libraries.

Datasets

Subtasks

Action Anticipation

Latest papers with no code

Most implemented Social Latest No code

NAS-TC: Neural Architecture Search on Temporal Convolutions for Complex Action Recognition

no code yet • 17 Mar 2021

Accordingly, because of the automated design of its network structure, Neural architecture search (NAS) has achieved great success in the image processing field and attracted substantial research attention in recent years.

Paper
Add Code

Temporal Difference Networks for Action Recognition

no code yet • 1 Jan 2021

To mitigate this issue, this paper presents a new video architecture, termed as Temporal Difference Network (TDN), with a focus on capturing multi-scale temporal information for efficient action recognition.

Paper
Add Code

Developing Motion Code Embedding for Action Recognition in Videos

no code yet • 10 Dec 2020

In this work, we propose a motion embedding strategy known as motion codes, which is a vectorized representation of motions based on a manipulation's salient mechanical attributes.

Paper
Add Code

Toward Accurate Person-level Action Recognition in Videos of Crowded Scenes

no code yet • 16 Oct 2020

Prior works always fail to deal with this problem in two aspects: (1) lacking utilizing information of the scenes; (2) lacking training data in the crowd and complex scenes.

Paper
Add Code

Dynamic Sampling Networks for Efficient Action Recognition in Videos

no code yet • 28 Jun 2020

The existing action recognition methods are mainly based on clip-level classifiers such as two-stream CNNs or 3D CNNs, which are trained from the randomly selected clips and applied to densely sampled clips during testing.

Paper
Add Code

Spatiotemporal Fusion in 3D CNNs: A Probabilistic View

no code yet • CVPR 2020

Based on the probability space, we further generate new fusion strategies which achieve the state-of-the-art performance on four well-known action recognition datasets.

Paper
Add Code

TEA: Temporal Excitation and Aggregation for Action Recognition

no code yet • CVPR 2020

Temporal modeling is key for action recognition in videos.

Paper
Add Code

Dynamic Inference: A New Approach Toward Efficient Video Action Recognition

no code yet • 9 Feb 2020

In a nutshell, we treat input frames and network depth of the computational graph as a 2-dimensional grid, and several checkpoints are placed on this grid in advance with a prediction module.

Paper
Add Code

An Information-rich Sampling Technique over Spatio-Temporal CNN for Classification of Human Actions in Videos

no code yet • 6 Feb 2020

Traditionally in deep learning based human activity recognition approaches, either a few random frames or every $k^{th}$ frame of the video is considered for training the 3D CNN, where $k$ is a small positive integer, like 4, 5, or 6.

Paper
Add Code

Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues

no code yet • 2 Dec 2019

There exist a wide range of intra class variations of the same actions and inter class similarity among the actions, at the same time, which makes the action recognition in videos very challenging.

Paper
Add Code

Action Recognition In Videos

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result