Video Classification
168 papers with code • 9 benchmarks • 15 datasets
Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.
Libraries
Use these libraries to find Video Classification models and implementationsDatasets
Latest papers
Multi-modality transrectal ultrasound video classification for identification of clinically significant prostate cancer
With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos.
Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning
High-quality and consistent annotations are fundamental to the successful development of robust machine learning models.
FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War
We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification.
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.
Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach
It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background.
MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation
To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively.
Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments
The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification, by incorporating knowledge distillation and model quantization to balance model accuracy and computational efficiency.
Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval
To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations.
Text-to-feature diffusion for audio-visual few-shot learning
Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process.
Identifying Misinformation on YouTube through Transcript Contextual Analysis with Transformer Models
We apply the trained models to three datasets: (a) YouTube Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and (c) Fake-News dataset (a collection of articles).