Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Classification

Dataset	Best Model	Compare
Breakfast	MA-LMM	See all
COIN	MA-LMM	See all
YouTube-8M	DCGN (self-attention graph pooling)	See all
MoB	VTN	See all
Hockey Fight Detection Dataset	CNN+LSTM	See all
Kinetics	Multigrid	See all
Charades	Multigrid	See all
Something-Something V1	MSNet-R50En (ours)	See all
Something-Something V2	MSNet-R50En (ours)	See all
Multimodal PISA	MMDL	See all
Home Action Genome	Cooperative Ours (3rd-person)	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Classification models and implementations

open-mmlab/mmaction2

6 papers

3,977

rwightman/pytorch-image-models

3 papers

30,231

facebookresearch/detectron

2 papers

26,165

open-mmlab/mmclassification

2 papers

3,226

See all 6 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

no code yet • 12 Nov 2022

The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.

Paper
Add Code

BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models

no code yet • 31 Oct 2022

We propose a new black-box method BOREx (Bayesian Optimization for Refinement of visual model Explanation) to refine a heat map produced by any method.

Paper
Add Code

Transfer-learning for video classification: Video Swin Transformer on multiple domains

no code yet • 18 Oct 2022

From the results, we conclude that VST generalizes well enough to classify out-of-domain videos without retraining when the target classes are from the same type as the classes used to train the model.

Paper
Add Code

Linear Video Transformer with Feature Fixation

no code yet • 15 Oct 2022

Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.

Paper
Add Code

FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification

no code yet • 22 Sep 2022

Furthermore, the holistic features are refined by the multi-scale temporal relations in a novel fusion module for yielding more discriminative video representations.

Paper
Add Code

Traffic Congestion Prediction using Deep Convolutional Neural Networks: A Color-coding Approach

no code yet • 16 Sep 2022

This work proposes a unique technique for traffic video classification using a color-coding scheme before training the traffic data in a Deep convolutional neural network.

Paper
Add Code

On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition

no code yet • 15 Sep 2022

Our work empirically explores the low data regime for video classification and discovers that, surprisingly, transformers perform extremely well in the low-labeled video setting compared to CNNs.

Paper
Add Code

UAV-CROWD: Violent and non-violent crowd activity simulator from the perspective of UAV

no code yet • 13 Aug 2022

Unmanned Aerial Vehicle (UAV) has gained significant traction in the recent years, particularly the context of surveillance.

Paper
Add Code

Motion Sensitive Contrastive Learning for Self-supervised Video Representation

no code yet • 12 Aug 2022

Contrastive learning has shown great potential in video representation learning.

Paper
Add Code

Two-Stream Transformer Architecture for Long Video Understanding

no code yet • 2 Aug 2022

Pure vision transformer architectures are highly effective for short video classification and action recognition tasks.

Paper
Add Code

Video Classification

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result