Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Libraries

Use these libraries to find Video Classification models and implementations

Most implemented papers

Billion-scale semi-supervised learning for image classification

facebookresearch/semi-supervised-ImageNet1K-models 2 May 2019

This paper presents a study of semi-supervised learning with large convolutional networks.

Reversible Vision Transformers

facebookresearch/SlowFast CVPR 2022

Reversible Vision Transformers achieve a reduced memory footprint of up to 15. 5x at roughly identical model complexity, parameters and accuracy, demonstrating the promise of reversible vision transformers as an efficient backbone for hardware resource limited training regimes.

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

MohsenFayyaz89/T3D 22 Nov 2017

Thus, by finetuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e. g. Sports-1M, and finetuned on the target datasets, e. g. HMDB51/UCF101.

Fine-grained Activity Recognition in Baseball Videos

piergiaj/mlb-youtube 9 Apr 2018

In this paper, we introduce a challenging new dataset, MLB-YouTube, designed for fine-grained activity detection.

Timeception for Complex Action Recognition

noureldien/timeception CVPR 2019

This paper focuses on the temporal aspect for recognizing human activities in videos; an important visual cue that has long been undervalued.

Gated Channel Transformation for Visual Recognition

z-x-yang/GCT CVPR 2020

This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters.

A Multigrid Method for Efficiently Training Video Models

facebookresearch/SlowFast CVPR 2020

We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).

Non-Local Neural Networks With Grouped Bilinear Attentional Transforms

BA-Transform/BAT-Image-Classification CVPR 2020

The core of our method is learnable and data-adaptive bilinear attentional transform (BA-Transform), whose merits are three-folds: first, BA-Transform is versatile to model a wide spectrum of local or global attentional operations, such as emphasizing specific local regions.

Pyramidal Convolution: Rethinking Convolutional Neural Networks for Visual Recognition

iduta/pyconv 20 Jun 2020

This work introduces pyramidal convolution (PyConv), which is capable of processing the input at multiple filter scales.

Revisiting ResNets: Improved Training and Scaling Strategies

tensorflow/tpu NeurIPS 2021

Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1. 7x - 2. 7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet.