Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Recognition

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Video Recognition models and implementations

open-mmlab/mmaction2

5 papers

3,876

open-mmlab/mmtracking

3 papers

3,372

facebookresearch/pytorchvideo

3 papers

3,178

towhee-io/towhee

3 papers

2,972

See all 9 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

no code yet • 18 Mar 2024

Benefiting from the popularity and scalably usability of Segment Anything Model (SAM), we first extract different regions according to semantic information and then track them through the video stream to maintain the temporal consistency.

Paper
Add Code

Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition

no code yet • 29 Feb 2024

Finally, we blend external multimodal knowledge in Adapt stage, by inserting multimodal knowledge adaptation modules into networks.

Paper
Add Code

Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition

no code yet • 11 Jan 2024

We introduce Hierarchical Augmentation and Distillation (HAD), which comprises the Hierarchical Augmentation Module (HAM) and Hierarchical Distillation Module (HDM) to efficiently utilize the hierarchical structure of data and models, respectively.

Paper
Add Code

Motion Guided Token Compression for Efficient Masked Video Modeling

no code yet • 10 Jan 2024

By implementing MGTC with the masking ratio of 25\%, we further augment accuracy by 0. 1 and simultaneously reduce computational costs by over 31\% on Kinetics-400.

Paper
Add Code

Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification

no code yet • 8 Jan 2024

To learn from multimodal videos effectively, in this work, we propose a novel audio-video recognition approach termed audio video Transformer, AVT, leveraging the effective spatio-temporal representation by the video Transformer to improve action recognition accuracy.

Paper
Add Code

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

no code yet • 4 Dec 2023

To this end, we design effective cross-snippet propagation modules to gradually exchange short-term video information among different snippets from two levels.

Paper
Add Code

Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer

no code yet • 11 Sep 2023

Phacoemulsification cataract surgery (PCS) is a routine procedure conducted using a surgical microscope, heavily reliant on the skill of the ophthalmologist.

Paper
Add Code

Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving

no code yet • ICCV 2023

VTD is a promising new direction for exploring the unification of perception tasks in autonomous driving.

Paper
Add Code

Temporal-Distributed Backdoor Attack Against Video Based Action Recognition

no code yet • 21 Aug 2023

Although there are extensive studies on backdoor attacks against image data, the susceptibility of video-based systems under backdoor attacks remains largely unexplored.

Paper
Add Code

Audio-Visual Glance Network for Efficient Video Recognition

no code yet • ICCV 2023

To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.

Paper
Add Code

Video Recognition

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result