Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Image source: The Kinetics Human Action Video Dataset

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Classification

Dataset	Best Model	Compare
Kinetics-400	InternVideo2-6B	See all
Kinetics-600	InternVideo2-6B	See all
Charades	TokenLearner	See all
Kinetics-700	InternVideo2-6B	See all
MiT	InternVideo2-6B	See all
Toyota Smarthome dataset	π-ViT	See all
AViD	TokenLearner	See all
THUMOS’14	3C-Net	See all
ActivityNet-1.2	W-TALC	See all
Kinetics-Sounds	Mirasol3B	See all
TTStroke-21 ME22	RGB and PRGB	See all
HMDB51	DualPath w/ ViT-B/16 MLPs.	See all
MiniKinetics	MARS+RGB+Flow (16 frames)	See all
YouCook2	VideoBERT (cross modal)	See all
UCF101	Ours	See all
Something-Something V2	CAST-B/16	See all
THUMOS'14	3C-Net	See all
Jester test	C2F	See all
BABEL	2s-AGCN	See all
ActivityNet	UniFormerV2-L	See all
TTStroke-21 ME21	STCNN	See all
Diving-48	DualPath w/ ViT-B/16	See all
CelebV-HQ	MARLIN	See all
Moments in Time	OmniVec	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Classification models and implementations

open-mmlab/mmaction2

15 papers

3,916

towhee-io/towhee

8 papers

3,005

rwightman/pytorch-image-models

4 papers

29,890

facebookresearch/pytorchvideo

3 papers

3,187

See all 18 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

High Quality Monocular Depth Estimation via Transfer Learning

ialhashim/DenseDepth • • 31 Dec 2018

Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.

Paper
Code

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

open-mmlab/mmaction2 • • CVPR 2017

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.

Paper
Code

Non-local Neural Networks

facebookresearch/video-nonlocal-net • • CVPR 2018

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Paper
Code

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

facebookresearch/OctConv • • ICCV 2019

Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.

Paper
Code

A Closer Look at Spatiotemporal Convolutions for Action Recognition

facebookresearch/R2Plus1D • • CVPR 2018

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.

Paper
Code

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

yjxiong/temporal-segment-networks • • 2 Aug 2016

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Paper
Code

Swin Transformer V2: Scaling Up Capacity and Resolution

microsoft/Swin-Transformer • • CVPR 2022

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Paper
Code

SlowFast Networks for Video Recognition

facebookresearch/SlowFast • • ICCV 2019

We present SlowFast networks for video recognition.

Paper
Code

Video Swin Transformer

SwinTransformer/Video-Swin-Transformer • • CVPR 2022

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

Paper
Code

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module • • ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Paper
Code

Action Classification

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result