Action Classification

225 papers with code • 23 benchmarks • 30 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Most implemented papers

Is Space-Time Attention All You Need for Video Understanding?

facebookresearch/TimeSformer 9 Feb 2021

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

The Kinetics Human Action Video Dataset

deepmind/kinetics-i3d 19 May 2017

We describe the DeepMind Kinetics human action video dataset.

Temporal Segment Networks for Action Recognition in Videos

yjxiong/temporal-segment-networks 8 May 2017

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

Graph-Based Global Reasoning Networks

facebookresearch/GloRe CVPR 2019

In this work, we propose a new approach for reasoning globally in which a set of features are globally aggregated over the coordinate space and then projected to an interaction space where relational reasoning can be efficiently computed.

X3D: Expanding Architectures for Efficient Video Recognition

facebookresearch/SlowFast CVPR 2020

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.

ViViT: A Video Vision Transformer

google-research/scenic ICCV 2021

We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification.

Two-Stream Convolutional Networks for Action Recognition in Videos

feichtenhofer/twostreamfusion NeurIPS 2014

Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.

Video Classification with Channel-Separated Convolutional Networks

facebookresearch/VMZ ICCV 2019

It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.

Multiscale Vision Transformers

facebookresearch/SlowFast ICCV 2021

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

ECO: Efficient Convolutional Network for Online Video Understanding

mzolfaghari/ECO-efficient-video-understanding ECCV 2018

In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.