Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Recognition

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Video Recognition models and implementations

open-mmlab/mmaction2

5 papers

3,896

open-mmlab/mmtracking

3 papers

3,377

facebookresearch/pytorchvideo

3 papers

3,184

towhee-io/towhee

3 papers

2,993

See all 9 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

What Can Simple Arithmetic Operations Do for Temporal Modeling?

whwu95/ATM • • ICCV 2023

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

18 Jul 2023

Paper
Code

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

talalwasim/video-focalnets • • ICCV 2023

Video transformer designs are based on self-attention that can model global context at a high computational cost.

13 Jul 2023

Paper
Code

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera • • 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

692

01 Jun 2023

Paper
Code

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

francis-rings/ila • • ICCV 2023

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

20 Apr 2023

Paper
Code

Use Your Head: Improving Long-Tail Video Recognition

tobyperrett/lmr-release • • CVPR 2023

We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties.

03 Apr 2023

Paper
Code

Frame Flexible Network

bespontaneous/ffn • • CVPR 2023

To fix this issue, we propose a general framework, named Frame Flexible Network (FFN), which not only enables the model to be evaluated at different frames to adjust its computation, but also reduces the memory costs of storing multiple models significantly.

26 Mar 2023

Paper
Code

The effectiveness of MAE pre-pretraining for billion-scale pretraining

facebookresearch/maws • • ICCV 2023

While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.

23 Mar 2023

Paper
Code

Making Vision Transformers Efficient from A Token Sparsification View

changsn/STViT-R • • CVPR 2023

In this work, we propose a novel Semantic Token ViT (STViT), for efficient global and local vision transformers, which can also be revised to serve as backbone for downstream tasks.

15 Mar 2023

Paper
Code

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

wlin-at/maxi • • ICCV 2023

We adapt a VL model for zero-shot and few-shot action recognition using a collection of unlabeled videos and an unpaired action dictionary.

15 Mar 2023

Paper
Code

Maximizing Spatio-Temporal Entropy of Deep 3D CNNs for Efficient Video Recognition

alibaba/lightweight-neural-architecture-search • • 5 Mar 2023

In this work, we propose to automatically design efficient 3D CNN architectures via a novel training-free neural architecture search approach tailored for 3D CNNs considering the model complexity.

345

05 Mar 2023

Paper
Code

Video Recognition

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result