Action Recognition In Videos

64 papers with code • 17 benchmarks • 17 datasets

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Recognition In Videos

Dataset	Best Model	Compare
Jester (Gesture Recognition)	CPNet Res34, 5 CP	See all
UCF101	STM (ImageNet+Kinetics pretrain)	See all
Something-Something V2	CAST-B/16	See all
Something-Something V1	STM (16 frames, ImageNet pretraining)	See all
Kinetics-400	CAST-B/16	See all
PKU-MMD	MMNet	See all
Sports-1M	G-Blend	See all
FS-Something-Something V2-Small	ITANet	See all
FS-Something-Something V2-Full	ITANet	See all
THUMOS’14	Single-stream R-C3D (two-way buffer)	See all
AVA v2.2	YOWO+LFB*	See all
HMDB-51	STM (ImageNet+Kinetics pretrain)	See all
AVA v2.1	YOWO+LFB*	See all
Kinetics-600	Florence	See all
ActivityNet	LSTM + Pretrained on YT-8M	See all
NTU RGB+D	2D-3D-Softargmax (RGB only)	See all
miniSports	G-Blend	See all

Show all 17 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Recognition In Videos models and implementations

open-mmlab/mmaction2

4 papers

3,884

yjxiong/caffe

3 papers

550

towhee-io/towhee

2 papers

2,983

MichiganCOG/M-PACT

2 papers

106

See all 5 libraries.

Datasets

Subtasks

Action Anticipation

Most implemented papers

Most implemented Social Latest No code

Learning Spatiotemporal Features with 3D Convolutional Networks

facebookarchive/C3D • • ICCV 2015

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

Paper
Code

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

yjxiong/temporal-segment-networks • • 2 Aug 2016

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Paper
Code

SlowFast Networks for Video Recognition

facebookresearch/SlowFast • • ICCV 2019

We present SlowFast networks for video recognition.

Paper
Code

Temporal Segment Networks for Action Recognition in Videos

yjxiong/temporal-segment-networks • • 8 May 2017

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Paper
Code

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

wushidonguc/two-stream-action-recognition-keras • 3 Dec 2012

To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

Paper
Code

Two-Stream Convolutional Networks for Action Recognition in Videos

feichtenhofer/twostreamfusion • NeurIPS 2014

Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.

Paper
Code

YouTube-8M: A Large-Scale Video Classification Benchmark

google/youtube-8m • • 27 Sep 2016

Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.

Paper
Code

Towards Good Practices for Very Deep Two-Stream ConvNets

yjxiong/caffe • 8 Jul 2015

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

Paper
Code

Temporal Relational Reasoning in Videos

metalbubble/TRN-pytorch • • ECCV 2018

Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.

Paper
Code

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 • • CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Paper
Code

Action Recognition In Videos

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result