Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification


Use these libraries to find Video Classification models and implementations

Latest papers with no code

Learning Correlation Structures for Vision Transformers

no code yet • 5 Apr 2024

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention.

Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning

no code yet • 27 Mar 2024

We present a generic Reinforcement Learning (RL) framework optimized for crafting adversarial attacks on different model types spanning from ECG signal analysis (1D), image classification (2D), and video classification (3D).

Pig aggression classification using CNN, Transformers and Recurrent Networks

no code yet • 13 Mar 2024

Thus, the development of applications can assist breeders in making decisions to improve production performance and reduce costs, once the animal behavior is analyzed by humans and this can lead to susceptible errors and time consumption.

Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

no code yet • 13 Mar 2024

Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated.

Learning Expressive And Generalizable Motion Features For Face Forgery Detection

no code yet • 8 Mar 2024

However, current sequence-based face forgery detection methods use general video classification networks directly, which discard the special and discriminative motion information for face manipulation detection.

A Multimodal Handover Failure Detection Dataset and Baselines

no code yet • 28 Feb 2024

To address this deficit, we present the multimodal Handover Failure Detection dataset, which consists of failures induced by the human participant, such as ignoring the robot or not releasing the object.

Time-, Memory- and Parameter-Efficient Visual Adaptation

no code yet • 5 Feb 2024

Here, we outperform a prior adaptor-based method which could only scale to a 1 billion parameter backbone, or fully-finetuning a smaller backbone, with the same GPU and less training time.

Short-Form Videos and Mental Health: A Knowledge-Guided Neural Topic Model

no code yet • 11 Jan 2024

To prevent widespread consequences, platforms are eager to predict these videos' impact on viewers' mental health.

Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification

no code yet • 8 Jan 2024

In recent years, researchers combine both audio and video signals to deal with challenges where actions are not well represented or captured by visual cues.

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

no code yet • 8 Jan 2024

To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy.