Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Classification

Dataset	Best Model	Compare
Breakfast	MA-LMM	See all
COIN	MA-LMM	See all
YouTube-8M	DCGN (self-attention graph pooling)	See all
MoB	VTN	See all
Hockey Fight Detection Dataset	CNN+LSTM	See all
Kinetics	Multigrid	See all
Charades	Multigrid	See all
Something-Something V1	MSNet-R50En (ours)	See all
Something-Something V2	MSNet-R50En (ours)	See all
Multimodal PISA	MMDL	See all
Home Action Genome	Cooperative Ours (3rd-person)	See all

Show all 11 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Video Classification models and implementations

open-mmlab/mmaction2

6 papers

3,862

rwightman/pytorch-image-models

3 papers

29,603

facebookresearch/detectron

2 papers

26,132

open-mmlab/mmclassification

2 papers

3,128

See all 6 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

Learning Correlation Structures for Vision Transformers

no code yet • 5 Apr 2024

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention.

Paper
Add Code

Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning

no code yet • 27 Mar 2024

We present a generic Reinforcement Learning (RL) framework optimized for crafting adversarial attacks on different model types spanning from ECG signal analysis (1D), image classification (2D), and video classification (3D).

Paper
Add Code

Pig aggression classification using CNN, Transformers and Recurrent Networks

no code yet • 13 Mar 2024

Thus, the development of applications can assist breeders in making decisions to improve production performance and reduce costs, once the animal behavior is analyzed by humans and this can lead to susceptible errors and time consumption.

Paper
Add Code

Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

no code yet • 13 Mar 2024

Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated.

Paper
Add Code

Learning Expressive And Generalizable Motion Features For Face Forgery Detection

no code yet • 8 Mar 2024

However, current sequence-based face forgery detection methods use general video classification networks directly, which discard the special and discriminative motion information for face manipulation detection.

Paper
Add Code

A Multimodal Handover Failure Detection Dataset and Baselines

no code yet • 28 Feb 2024

To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object.

Paper
Add Code

Time-, Memory- and Parameter-Efficient Visual Adaptation

no code yet • 5 Feb 2024

Here, we outperform a prior adaptor-based method which could only scale to a 1 billion parameter backbone, or fully-finetuning a smaller backbone, with the same GPU and less training time.

Paper
Add Code

Short-Form Videos and Mental Health: A Knowledge-Guided Neural Topic Model

no code yet • 11 Jan 2024

To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health.

Paper
Add Code

Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

no code yet • 8 Jan 2024

In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues.

Paper
Add Code

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

no code yet • 8 Jan 2024

To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy.

Paper
Add Code

Video Classification

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result