Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Classification

Dataset	Best Model	Compare
Breakfast	MA-LMM	See all
COIN	MA-LMM	See all
YouTube-8M	DCGN (self-attention graph pooling)	See all
MoB	VTN	See all
Hockey Fight Detection Dataset	CNN+LSTM	See all
Kinetics	Multigrid	See all
Charades	Multigrid	See all
Something-Something V1	MSNet-R50En (ours)	See all
Something-Something V2	MSNet-R50En (ours)	See all
Multimodal PISA	MMDL	See all
Home Action Genome	Cooperative Ours (3rd-person)	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Classification models and implementations

open-mmlab/mmaction2

6 papers

3,898

rwightman/pytorch-image-models

3 papers

29,800

facebookresearch/detectron

2 papers

26,145

open-mmlab/mmclassification

2 papers

3,168

See all 6 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

no code yet • 8 Jan 2024

To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy.

Paper
Add Code

Neural architecture impact on identifying temporally extended Reinforcement Learning tasks

no code yet • 4 Oct 2023

In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too.

Paper
Add Code

Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding

no code yet • 20 Sep 2023

While most modern video understanding models operate on short-range clips, real-world videos are often several minutes long with semantically consistent segments of variable length.

Paper
Add Code

Language as the Medium: Multimodal Video Classification through text only

no code yet • 19 Sep 2023

Despite an exciting new wave of multimodal machine learning models, current approaches still struggle to interpret the complex contextual relationships between the different modalities present in videos.

Paper
Add Code

AV-MaskEnhancer: Enhancing Video Representations through Audio-Visual Masked Autoencoder

no code yet • 15 Sep 2023

Learning high-quality video representation has shown significant applications in computer vision and remains challenging.

Paper
Add Code

The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

no code yet • 11 Jul 2023

Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.

Paper
Add Code

Active Learning for Video Classification with Frame Level Queries

no code yet • International Joint Conference on Neural Networks (IJCNN) 2023

To the best of our knowledge, this is the first research effort to develop an active learning framework for video classification, where the annotators need to inspect only a few frames to produce a label, rather than watching the end-to-end video.

Paper
Add Code

Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers

no code yet • 12 Jun 2023

The coherence loss uses the feature centers generated by the static images to guide the frame attention in the video model.

Paper
Add Code

Multi-label Video Classification for Underwater Ship Inspection

no code yet • 27 May 2023

Today ship hull inspection including the examination of the external coating, detection of defects, and other types of external degradation such as corrosion and marine growth is conducted underwater by means of Remotely Operated Vehicles (ROVs).

Paper
Add Code

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

no code yet • NeurIPS 2023

We conduct extensive empirical studies and reveal the following key insights: 1) Performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model.

Paper
Add Code

Video Classification

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result