Video Classification
172 papers with code • 11 benchmarks • 17 datasets
Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.
Libraries
Use these libraries to find Video Classification models and implementationsDatasets
Latest papers with no code
Deep Unsupervised Key Frame Extraction for Efficient Video Classification
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
BOREx: Bayesian-Optimization--Based Refinement of Saliency Map for Image- and Video-Classification Models
We propose a new black-box method BOREx (Bayesian Optimization for Refinement of visual model Explanation) to refine a heat map produced by any method.
Transfer-learning for video classification: Video Swin Transformer on multiple domains
From the results, we conclude that VST generalizes well enough to classify out-of-domain videos without retraining when the target classes are from the same type as the classes used to train the model.
Linear Video Transformer with Feature Fixation
Therefore, we propose a feature fixation module to reweight the feature importance of the query and key before computing linear attention.
FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification
Furthermore, the holistic features are refined by the multi-scale temporal relations in a novel fusion module for yielding more discriminative video representations.
Traffic Congestion Prediction using Deep Convolutional Neural Networks: A Color-coding Approach
This work proposes a unique technique for traffic video classification using a color-coding scheme before training the traffic data in a Deep convolutional neural network.
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Our work empirically explores the low data regime for video classification and discovers that, surprisingly, transformers perform extremely well in the low-labeled video setting compared to CNNs.
UAV-CROWD: Violent and non-violent crowd activity simulator from the perspective of UAV
Unmanned Aerial Vehicle (UAV) has gained significant traction in the recent years, particularly the context of surveillance.
Motion Sensitive Contrastive Learning for Self-supervised Video Representation
Contrastive learning has shown great potential in video representation learning.
Two-Stream Transformer Architecture for Long Video Understanding
Pure vision transformer architectures are highly effective for short video classification and action recognition tasks.