Video Classification
172 papers with code • 11 benchmarks • 17 datasets
Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.
Libraries
Use these libraries to find Video Classification models and implementationsDatasets
Most implemented papers
Token Shift Transformer for Video Classification
It is worth noticing that our TokShift transformer is a pure convolutional-free video transformer pilot with computational efficiency for video understanding.
Deep Temporal Linear Encoding Networks
Advantages of TLEs are: (a) they encode the entire video into a compact feature representation, learning the semantics and a discriminative feature space; (b) they are applicable to all kinds of networks like 2D and 3D CNNs for video classification; and (c) they model feature interactions in a more expressive way and without loss of information.
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
In this paper, we devise multiple variants of bottleneck building blocks in a residual learning framework by simulating $3\times3\times3$ convolutions with $1\times3\times3$ convolutional filters on spatial domain (equivalent to 2D CNN) plus $3\times1\times1$ convolutions to construct temporal connections on adjacent feature maps in time.
Compact Generalized Non-local Network
The non-local module is designed for capturing long-range spatio-temporal dependencies in images and videos.
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures
Learning to represent videos is a very challenging task both algorithmically and computationally.
MotionSqueeze: Neural Motion Feature Learning for Video Understanding
As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding.
VideoMix: Rethinking Data Augmentation for Video Classification
Recent data augmentation strategies have been reported to address the overfitting problems in static image classifiers.
Reinforcement Learning with Latent Flow
Temporal information is essential to learning effective policies with Reinforcement Learning (RL).
Busy-Quiet Video Disentangling for Video Classification
We design a trainable Motion Band-Pass Module (MBPM) for separating busy information from quiet information in raw video data.
Out-of-Distribution Detection Using Union of 1-Dimensional Subspaces
In this paper, we argue that OOD samples can be detected more easily if the training data is embedded into a low-dimensional space, such that the embedded training samples lie on a union of 1-dimensional subspaces.