Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Libraries

Use these libraries to find Video Recognition models and implementations
5 papers
3,924
3 papers
3,009
See all 9 libraries.

Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

wengzejia1/open-vclip 1 Feb 2023

Our framework extends CLIP with minimal modifications to model spatial-temporal relationships in videos, making it a specialized video classifier, while striving for generalization.

91
01 Feb 2023

Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring

farewellthree/stan CVPR 2023

In this paper, based on the CLIP model, we revisit temporal modeling in the context of image-to-video knowledge transferring, which is the key point for extending image-text pretrained models to the video domain.

85
26 Jan 2023

Efficient Robustness Assessment via Adversarial Spatial-Temporal Focus on Videos

deepsota/astfocus 3 Jan 2023

To implement this idea, we design the novel Adversarial spatial-temporal Focus (AstFocus) attack on videos, which performs attacks on the simultaneously focused key frames and key regions from the inter-frames and intra-frames in the video.

5
03 Jan 2023

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

whwu95/Cap4Video CVPR 2023

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

204
31 Dec 2022

Efficient Movie Scene Detection using State-Space Transformers

md-mohaiminul/trans4mer CVPR 2023

Given a sequence of frames divided into movie shots (uninterrupted periods where the camera position does not change), the S4A block first applies self-attention to capture short-range intra-shot dependencies.

22
29 Dec 2022

VLG: General Video Recognition with Web Textual Knowledge

mcg-nju/vlg 3 Dec 2022

Our VLG is first pre-trained on video and language datasets to learn a shared feature space, and then devises a flexible bi-modal attention head to collaborate high-level semantic concepts under different settings.

8
03 Dec 2022

SVFormer: Semi-supervised Video Transformer for Action Recognition

chenhsing/svformer CVPR 2023

In this paper, we investigate the use of transformer models under the SSL setting for action recognition.

79
23 Nov 2022

Look More but Care Less in Video Recognition

bespontaneous/afnet-pytorch 18 Nov 2022

To tackle this problem, we propose Ample and Focal Network (AFNet), which is composed of two branches to utilize more frames but with less computation.

20
18 Nov 2022

Cluster and Aggregate: Face Recognition with Large Probe Set

mk-minchul/caface 19 Oct 2022

Advances in attention and recurrent modules have led to feature fusion that can model the relationship among the images in the input set.

32
19 Oct 2022

Towards a Unified View on Visual Parameter-Efficient Transfer Learning

bruceyo/V-PETL 3 Oct 2022

Towards this goal, we propose a framework with a unified view of PETL called visual-PETL (V-PETL) to investigate the effects of different PETL techniques, data scales of downstream domains, positions of trainable parameters, and other aspects affecting the trade-off.

26
03 Oct 2022