Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Libraries

Use these libraries to find Video Classification models and implementations

Most implemented papers

Video Classification with Channel-Separated Convolutional Networks

facebookresearch/VMZ ICCV 2019

It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

facebookresearch/detectron2 CVPR 2022

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

sense-x/uniformer 24 Jan 2022

Different from the typical transformer blocks, the relation aggregators in our UniFormer block are equipped with local and global token affinity respectively in shallow and deep layers, allowing to tackle both redundancy and dependency for efficient and effective representation learning.

YouTube-8M: A Large-Scale Video Classification Benchmark

google/youtube-8m 27 Sep 2016

Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.

ECO: Efficient Convolutional Network for Online Video Understanding

mzolfaghari/ECO-efficient-video-understanding ECCV 2018

In this paper, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time.

Learnable pooling with Context Gating for video classification

antoine77340/Youtube-8M-WILLOW 21 Jun 2017

In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

longxiang92/Flash-MNIST CVPR 2018

In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets.

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

Revisiting Classifier: Transferring Vision-Language Models for Video Recognition

whwu95/text4vis 4 Jul 2022

In this study, we focus on transferring knowledge for video classification tasks.

TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition

chihyaoma/Activity-Recognition-with-CNN-and-RNN 30 Mar 2017

We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.