Action Recognition

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Greatest papers with code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

CVPR 2018 tensorflow/models

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.

ACTION RECOGNITION VIDEO UNDERSTANDING

Unsupervised Learning of Object Structure and Dynamics from Videos

NeurIPS 2019 google-research/google-research

Extracting and predicting object structure and dynamics from videos without supervision is a major challenge in machine learning.

ACTION RECOGNITION CONTINUOUS CONTROL OBJECT TRACKING VIDEO PREDICTION

Large-scale weakly-supervised pre-training for video action recognition

CVPR 2019 microsoft/computervision-recipes

Second, frame-based models perform quite well on action recognition; is pre-training for good image features sufficient or is pre-training for spatio-temporal features valuable for optimal transfer learning?

ACTION CLASSIFICATION ACTION RECOGNITION ACTIVITY RECOGNITION IN VIDEOS EGOCENTRIC ACTIVITY RECOGNITION TRANSFER LEARNING

A Closer Look at Spatiotemporal Convolutions for Action Recognition

CVPR 2018 microsoft/computervision-recipes

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.

ACTION RECOGNITION

A Multigrid Method for Efficiently Training Video Models

CVPR 2020 facebookresearch/SlowFast

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

ACTION DETECTION ACTION RECOGNITION VIDEO UNDERSTANDING

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

CVPR 2018 kenshohara/3D-ResNets-PyTorch

The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels.

ACTION RECOGNITION

YouTube-8M: A Large-Scale Video Classification Benchmark

27 Sep 2016google/youtube-8m

Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.

 Ranked #1 on Action Recognition on ActivityNet (using extra training data)

ACTION RECOGNITION