Video Recognition

147 papers with code • 0 benchmarks • 10 datasets

Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.

Libraries

Use these libraries to find Video Recognition models and implementations
5 papers
3,908
3 papers
3,001
See all 9 libraries.

Latest papers with no code

Audio-Visual Glance Network for Efficient Video Recognition

no code yet • ICCV 2023

To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.

On the Importance of Spatial Relations for Few-shot Action Recognition

no code yet • 14 Aug 2023

We are thus motivated to investigate the importance of spatial relations and propose a more accurate few-shot action recognition method that leverages both spatial and temporal information.

View while Moving: Efficient Video Recognition in Long-untrimmed Videos

no code yet • 9 Aug 2023

To this end, inspired by human cognition, we propose a novel recognition paradigm of "View while Moving" for efficient long-untrimmed video recognition.

TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter

no code yet • 22 Jun 2023

In situations involving system upgrades that require updating the upstream foundation model, it becomes essential to re-train all downstream modules to adapt to the new foundation model, which is inflexible and inefficient.

Enhanced Multimodal Representation Learning with Cross-modal KD

no code yet • CVPR 2023

This paper explores the tasks of leveraging auxiliary modalities which are only available at training to enhance multimodal representation learning through cross-modal Knowledge Distillation (KD).

A two-way translation system of Chinese sign language based on computer vision

no code yet • 3 Jun 2023

As the main means of communication for deaf people, sign language has a special grammatical order, so it is meaningful and valuable to develop a real-time translation system for sign language.

Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition

no code yet • 22 May 2023

This paper studies the computational offloading of video action recognition in edge computing.

Inter-frame Accelerate Attack against Video Interpolation Models

no code yet • 11 May 2023

We apply adversarial attacks to VIF models and find that the VIF models are very vulnerable to adversarial examples.

Multi-object Video Generation from Single Frame Layouts

no code yet • 6 May 2023

In this paper, we study video synthesis with emphasis on simplifying the generation conditions.

Efficient Decision-based Black-box Patch Attacks on Video Recognition

no code yet • ICCV 2023

First, STDE introduces target videos as patch textures and only adds patches on keyframes that are adaptively selected by temporal difference.