Video Classification
175 papers with code • 11 benchmarks • 17 datasets
Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.
Libraries
Use these libraries to find Video Classification models and implementationsDatasets
Subtasks
Latest papers with no code
NetFlick: Adversarial Flickering Attacks on Deep Learning Based Video Compression
Experimental results demonstrate that NetFlick can successfully deteriorate the performance of video compression frameworks in both digital- and physical-settings and can be further extended to attack downstream video classification networks.
Unified Keypoint-based Action Recognition Framework via Structured Keypoint Pooling
A point cloud deep-learning paradigm is introduced to the action recognition, and a unified framework along with a novel deep neural network architecture called Structured Keypoint Pooling is proposed.
Selective Structured State-Spaces for Long-Form Video Understanding
To address this limitation, we present a novel Selective S4 (i. e., S5) model that employs a lightweight mask generator to adaptively select informative image tokens resulting in more efficient and accurate modeling of long-term spatiotemporal dependencies in videos.
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
We show that visual representations learned under ViC-MAE generalize well to both video and image classification tasks.
Temporal Coherent Test-Time Optimization for Robust Video Classification
To exploit information in video with self-supervised learning, TeCo uses global content from video clips and optimizes models for entropy minimization.
Video4MRI: An Empirical Study on Brain Magnetic Resonance Image Analytics with CNN-based Video Classification Frameworks
Due to the high similarity between MRI data and videos, we conduct extensive empirical studies on video recognition techniques for MRI classification to answer the questions: (1) can we directly use video recognition models for MRI classification, (2) which model is more appropriate for MRI, (3) are the common tricks like data augmentation in video recognition still useful for MRI classification?
Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms
To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens.
Few-Shot Video Classification via Representation Fusion and Promotion Learning
This operation maximizes the contribution of discriminative frames to further capture the similarity of support and query samples from the same category.
Truncate-Split-Contrast: A Framework for Learning from Mislabeled Videos
Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1. 6\%.
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
We explore an efficient approach to establish a foundational video-text model.