Video Classification

175 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Classification

Dataset	Best Model	Compare
Breakfast	MA-LMM	See all
COIN	MA-LMM	See all
YouTube-8M	DCGN (self-attention graph pooling)	See all
MoB	VTN	See all
Hockey Fight Detection Dataset	CNN+LSTM	See all
Kinetics	Multigrid	See all
Charades	Multigrid	See all
Something-Something V1	MSNet-R50En (ours)	See all
Something-Something V2	MSNet-R50En (ours)	See all
Multimodal PISA	MMDL	See all
Home Action Genome	Cooperative Ours (3rd-person)	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Classification models and implementations

open-mmlab/mmaction2

6 papers

3,981

rwightman/pytorch-image-models

3 papers

30,258

facebookresearch/detectron

2 papers

26,167

open-mmlab/mmclassification

2 papers

3,232

See all 6 libraries.

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

NetFlick: Adversarial Flickering Attacks on Deep Learning Based Video Compression

no code yet • 4 Apr 2023

Experimental results demonstrate that NetFlick can successfully deteriorate the performance of video compression frameworks in both digital- and physical-settings and can be further extended to attack downstream video classification networks.

Paper
Add Code

Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling

no code yet • CVPR 2023

A point cloud deep-learning paradigm is introduced to the action recognition, and a unified framework along with a novel deep neural network architecture called Structured Keypoint Pooling is proposed.

Paper
Add Code

Selective Structured State-Spaces for Long-Form Video Understanding

no code yet • CVPR 2023

To address this limitation, we present a novel Selective S4 (i. e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos.

Paper
Add Code

ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders

no code yet • 21 Mar 2023

We show that visual representations learned under ViC-MAE generalize well to both video and image classification tasks.

Paper
Add Code

Temporal Coherent Test-Time Optimization for Robust Video Classification

no code yet • 28 Feb 2023

To exploit information in video with self-supervised learning, TeCo uses global content from video clips and optimizes models for entropy minimization.

Paper
Add Code

Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks

no code yet • 24 Feb 2023

Due to the high similarity between MRI data and videos, we conduct extensive empirical studies on video recognition techniques for MRI classification to answer the questions: (1) can we directly use video recognition models for MRI classification, (2) which model is more appropriate for MRI, (3) are the common tricks like data augmentation in video recognition still useful for MRI classification?

Paper
Add Code

Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms

no code yet • 21 Feb 2023

To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens.

Paper
Add Code

Few-Shot Video Classification via Representation Fusion and Promotion Learning

no code yet • ICCV 2023

This operation maximizes the contribution of discriminative frames to further capture the similarity of support and query samples from the same category.

Paper
Add Code

Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos

no code yet • 27 Dec 2022

Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1. 6\%.

Paper
Add Code

VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

no code yet • 9 Dec 2022

We explore an efficient approach to establish a foundational video-text model.

Paper
Add Code

Video Classification

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result