Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Libraries

Use these libraries to find Video Recognition models and implementations
5 papers
3,896
3 papers
2,993
See all 9 libraries.

What Can Simple Arithmetic Operations Do for Temporal Modeling?

whwu95/ATM ICCV 2023

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

65
18 Jul 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

talalwasim/video-focalnets ICCV 2023

Video transformer designs are based on self-attention that can model global context at a high computational cost.

81
13 Jul 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

692
01 Jun 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

francis-rings/ila ICCV 2023

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

28
20 Apr 2023

Use Your Head: Improving Long-Tail Video Recognition

tobyperrett/lmr-release CVPR 2023

We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties.

4
03 Apr 2023

Frame Flexible Network

bespontaneous/ffn CVPR 2023

To fix this issue, we propose a general framework, named Frame Flexible Network (FFN), which not only enables the model to be evaluated at different frames to adjust its computation, but also reduces the memory costs of storing multiple models significantly.

52
26 Mar 2023

The effectiveness of MAE pre-pretraining for billion-scale pretraining

facebookresearch/maws ICCV 2023

While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.

63
23 Mar 2023

Making Vision Transformers Efficient from A Token Sparsification View

changsn/STViT-R CVPR 2023

In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks.

30
15 Mar 2023

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

wlin-at/maxi ICCV 2023

We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary.

21
15 Mar 2023

Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition

alibaba/lightweight-neural-architecture-search 5 Mar 2023

In this work, we propose to automatically design efficient 3D CNN architectures via a novel training-free neural architecture search approach tailored for 3D CNNs considering the model complexity.

345
05 Mar 2023