Video Recognition
147 papers with code • 0 benchmarks • 10 datasets
Video Recognition is a process of obtaining, processing, and analysing data that it receives from a visual source, specifically video.
Benchmarks
These leaderboards are used to track progress in Video Recognition
Libraries
Use these libraries to find Video Recognition models and implementationsDatasets
Latest papers with no code
Audio-Visual Glance Network for Efficient Video Recognition
To address this issue, we propose Audio-Visual Glance Network (AVGN), which leverages the commonly available audio and visual modalities to efficiently process the spatio-temporally important parts of a video.
On the Importance of Spatial Relations for Few-shot Action Recognition
We are thus motivated to investigate the importance of spatial relations and propose a more accurate few-shot action recognition method that leverages both spatial and temporal information.
View while Moving: Efficient Video Recognition in Long-untrimmed Videos
To this end, inspired by human cognition, we propose a novel recognition paradigm of "View while Moving" for efficient long-untrimmed video recognition.
TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter
In situations involving system upgrades that require updating the upstream foundation model, it becomes essential to re-train all downstream modules to adapt to the new foundation model, which is inflexible and inefficient.
Enhanced Multimodal Representation Learning with Cross-modal KD
This paper explores the tasks of leveraging auxiliary modalities which are only available at training to enhance multimodal representation learning through cross-modal Knowledge Distillation (KD).
A two-way translation system of Chinese sign language based on computer vision
As the main means of communication for deaf people, sign language has a special grammatical order, so it is meaningful and valuable to develop a real-time translation system for sign language.
Spatiotemporal Attention-based Semantic Compression for Real-time Video Recognition
This paper studies the computational offloading of video action recognition in edge computing.
Inter-frame Accelerate Attack against Video Interpolation Models
We apply adversarial attacks to VIF models and find that the VIF models are very vulnerable to adversarial examples.
Multi-object Video Generation from Single Frame Layouts
In this paper, we study video synthesis with emphasis on simplifying the generation conditions.
Efficient Decision-based Black-box Patch Attacks on Video Recognition
First, STDE introduces target videos as patch textures and only adds patches on keyframes that are adaptively selected by temporal difference.