Video Recognition

145 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Libraries

Use these libraries to find Video Recognition models and implementations
5 papers
3,831
3 papers
2,937
See all 9 libraries.

ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video

leexinhao/ZeroI2V 2 Oct 2023

In this paper, our goal is to present a zero-cost adaptation paradigm (ZeroI2V) to transfer the image transformers to video recognition tasks (i. e., introduce zero extra cost to the adapted models during inference).

10
02 Oct 2023

Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning

alibaba-mmai-research/dist ICCV 2023

When pre-training on the large-scale Kinetics-710, we achieve 89. 7% on Kinetics-400 with a frozen ViT-L model, which verifies the scalability of DiST.

25
14 Sep 2023

Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers

WISION-Lab/eventful-transformer ICCV 2023

In this work, we exploit temporal redundancy between subsequent inputs to reduce the cost of Transformers for video processing.

29
25 Aug 2023

Learning from Semantic Alignment between Unpaired Multiviews for Egocentric Video Recognition

wqtwjt1996/sum-l ICCV 2023

To facilitate the data efficiency of multiview learning, we further perform video-text alignment for first-person and third-person videos, to fully leverage the semantic knowledge to improve video representations.

3
22 Aug 2023

Audio-Visual Class-Incremental Learning

weiguopian/av-cil_iccv2023 ICCV 2023

We demonstrate that joint audio-visual modeling can improve class-incremental learning, but current methods fail to preserve semantic similarity between audio and visual features as incremental step grows.

16
21 Aug 2023

Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

chuhanxx/helping_hand_for_egocentric_videos ICCV 2023

We demonstrate the performance of the object-aware representations learnt by our model, by: (i) evaluating it for strong transfer, i. e. through zero-shot testing, on a number of downstream video-text retrieval and classification benchmarks; and (ii) by using the representations learned as input for long-term video understanding tasks (e. g. Episodic Memory in Ego4D).

25
15 Aug 2023

Orthogonal Temporal Interpolation for Zero-Shot Video Recognition

sweetorangezhuyan/mm2023_oti 14 Aug 2023

We propose a model called OTI for ZSVR by employing orthogonal temporal interpolation and the matching loss based on VLMs.

7
14 Aug 2023

Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation

mark12ding/sta ICCV 2023

Based on the STA score, we are able to progressively prune the tokens without introducing any additional parameters or requiring further re-training.

18
08 Aug 2023

What Can Simple Arithmetic Operations Do for Temporal Modeling?

whwu95/ATM ICCV 2023

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

39
18 Jul 2023

Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition

talalwasim/video-focalnets ICCV 2023

Video transformer designs are based on self-attention that can model global context at a high computational cost.

80
13 Jul 2023