Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Image source: The Kinetics Human Action Video Dataset

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Classification

Dataset	Best Model	Compare
Kinetics-400	InternVideo2-6B	See all
Kinetics-600	InternVideo2-6B	See all
Charades	TokenLearner	See all
Kinetics-700	InternVideo2-6B	See all
MiT	InternVideo2-6B	See all
Toyota Smarthome dataset	π-ViT	See all
AViD	TokenLearner	See all
THUMOS’14	3C-Net	See all
ActivityNet-1.2	W-TALC	See all
Kinetics-Sounds	Mirasol3B	See all
TTStroke-21 ME22	RGB and PRGB	See all
HMDB51	DualPath w/ ViT-B/16 MLPs.	See all
MiniKinetics	MARS+RGB+Flow (16 frames)	See all
YouCook2	VideoBERT (cross modal)	See all
UCF101	Ours	See all
Something-Something V2	CAST-B/16	See all
THUMOS'14	3C-Net	See all
Jester test	C2F	See all
BABEL	2s-AGCN	See all
ActivityNet	UniFormerV2-L	See all
TTStroke-21 ME21	STCNN	See all
Diving-48	DualPath w/ ViT-B/16	See all
CelebV-HQ	MARLIN	See all
Moments in Time	OmniVec	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Classification models and implementations

open-mmlab/mmaction2

15 papers

3,898

towhee-io/towhee

8 papers

2,996

rwightman/pytorch-image-models

4 papers

29,789

facebookresearch/pytorchvideo

3 papers

3,184

See all 18 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

ECO: Efficient Convolutional Network for Online Video Understanding

mzolfaghari/ECO-efficient-video-understanding • • ECCV 2018

In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.

Paper
Code

EVA: Exploring the Limits of Masked Visual Representation Learning at Scale

rwightman/pytorch-image-models • • CVPR 2023

We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data.

Paper
Code

Temporal Relational Reasoning in Videos

metalbubble/TRN-pytorch • • ECCV 2018

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.

Paper
Code

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 • • CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Paper
Code

Revisiting 3D ResNets for Video Recognition

tensorflow/models • • 3 Sep 2021

A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition.

Paper
Code

Masked Feature Prediction for Self-Supervised Visual Pre-Training

facebookresearch/SlowFast • • CVPR 2022

We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training of video models.

Paper
Code

CoCa: Contrastive Captioners are Image-Text Foundation Models

mlfoundations/open_clip • • 4 May 2022

We apply a contrastive loss between unimodal image and text embeddings, in addition to a captioning loss on the multimodal decoder outputs which predicts text tokens autoregressively.

Paper
Code

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

whwu95/text4vis • • 4 Jul 2022

In this study, we focus on transferring knowledge for video classification tasks.

Paper
Code

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

whwu95/BIKE • • CVPR 2023

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

Paper
Code

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

chihyaoma/Activity-Recognition-with-CNN-and-RNN • • 30 Mar 2017

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.

Paper
Code

Action Classification

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result