Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Classification

Dataset	Best Model	Compare
Breakfast	MA-LMM	See all
COIN	MA-LMM	See all
YouTube-8M	DCGN (self-attention graph pooling)	See all
MoB	VTN	See all
Hockey Fight Detection Dataset	CNN+LSTM	See all
Kinetics	Multigrid	See all
Charades	Multigrid	See all
Something-Something V1	MSNet-R50En (ours)	See all
Something-Something V2	MSNet-R50En (ours)	See all
Multimodal PISA	MMDL	See all
Home Action Genome	Cooperative Ours (3rd-person)	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Classification models and implementations

open-mmlab/mmaction2

6 papers

3,943

rwightman/pytorch-image-models

3 papers

30,048

facebookresearch/detectron

2 papers

26,151

open-mmlab/mmclassification

2 papers

3,202

See all 6 libraries.

Datasets

Most implemented papers

Most implemented Social Latest No code

Token Shift Transformer for Video Classification

VideoNetworks/TokShift-Transformer • • 5 Aug 2021

It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.

Paper
Code

Deep Temporal Linear Encoding Networks

bryanyzhu/two-stream-pytorch • • CVPR 2017

Advantages of TLEs are: (a) they encode the entire video into a compact feature representation, learning the semantics and a discriminative feature space; (b) they are applicable to all kinds of networks like 2D and 3D CNNs for video classification; and (c) they model feature interactions in a more expressive way and without loss of information.

Paper
Code

Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks

ZhaofanQiu/pseudo-3d-residual-networks • • ICCV 2017

In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.

Paper
Code

Compact Generalized Non-local Network

KaiyuYue/cgnl-network.pytorch • • NeurIPS 2018

The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos.

Paper
Code

AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures

tensorflow/models • • ICLR 2020

Learning to represent videos is a very challenging task both algorithmically and computationally.

Paper
Code

MotionSqueeze: Neural Motion Feature Learning for Video Understanding

arunos728/MotionSqueeze • • ECCV 2020

As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding.

Paper
Code

VideoMix: Rethinking Data Augmentation for Video Classification

jayChung0302/videomix • • 7 Dec 2020

Recent data augmentation strategies have been reported to address the overfitting problems in static image classifiers.

Paper
Code

Reinforcement Learning with Latent Flow

WendyShang/flare • • NeurIPS 2021

Temporal information is essential to learning effective policies with Reinforcement Learning (RL).

Paper
Code

Busy-Quiet Video Disentangling for Video Classification

guoxih/Busy-Quiet-Video-Disentangling-for-Video-Classification • • 29 Mar 2021

We design a trainable Motion Band-Pass Module (MBPM) for separating busy information from quiet information in raw video data.

Paper
Code

Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces

zaeemzadeh/OOD • • CVPR 2021

In this paper, we argue that OOD samples can be detected more easily if the training data is embedded into a low-dimensional space, such that the embedded training samples lie on a union of 1-dimensional subspaces.

Paper
Code

Video Classification

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result