Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Libraries

Use these libraries to find Video Recognition models and implementations
5 papers
3,887
3 papers
2,987
See all 9 libraries.

Most implemented papers

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

facebookresearch/OctConv ICCV 2019

Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.

Video Swin Transformer

SwinTransformer/Video-Swin-Transformer CVPR 2022

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Would Mega-scale Datasets Further Enhance Spatiotemporal 3D CNNs?

kenshohara/3D-ResNets-PyTorch 10 Apr 2020

Therefore, in the present paper, we conduct exploration study in order to improve spatiotemporal 3D CNNs as follows: (i) Recently proposed large-scale video datasets help improve spatiotemporal 3D CNNs in terms of video classification accuracy.

Micro-Batch Training with Batch-Channel Normalization and Weight Standardization

joe-siyuan-qiao/WeightStandardization 25 Mar 2019

Batch Normalization (BN) has become an out-of-box technique to improve deep network training.

X3D: Expanding Architectures for Efficient Video Recognition

facebookresearch/SlowFast CVPR 2020

This paper presents X3D, a family of efficient video networks that progressively expand a tiny 2D image classification architecture along multiple network axes, in space, time, width and depth.

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

garythung/torch-lrcn CVPR 2015

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise.

Multiscale Vision Transformers

facebookresearch/SlowFast ICCV 2021

We evaluate this fundamental architectural prior for modeling the dense nature of visual signals for a variety of video recognition tasks where it outperforms concurrent vision transformers that rely on large scale external pre-training and are 5-10x more costly in computation and parameters.

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

facebookresearch/detectron2 CVPR 2022

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for image and video classification, as well as object detection.