Video Classification

168 papers with code • 9 benchmarks • 15 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Libraries

Use these libraries to find Video Classification models and implementations

Multi-modality transrectal ultrasound video classification for identification of clinically significant prostate cancer

2313595986/prostatetrus 14 Feb 2024

With the aim of effectively identifying prostate cancer, we propose a framework for the classification of clinically significant prostate cancer (csPCa) from multi-modality TRUS videos.

0
14 Feb 2024

Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning

netflix/videoannotator 9 Feb 2024

High-quality and consistent annotations are fundamental to the successful development of robust machine learning models.

16
09 Feb 2024

FakeClaim: A Multiple Platform-driven Dataset for Identification of Fake News on 2023 Israel-Hamas War

gautamshahi/fakeclaim 29 Jan 2024

We contribute the first publicly available dataset of factual claims from different platforms and fake YouTube videos on the 2023 Israel-Hamas war for automatic fake YouTube video classification.

1
29 Jan 2024

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

opengvlab/internvl 21 Dec 2023

However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs.

595
21 Dec 2023

Revisiting Foreground and Background Separation in Weakly-supervised Temporal Action Localization: A Clustering-based Approach

qinying-liu/case ICCV 2023

It comprises two core components: a snippet clustering component that groups the snippets into multiple latent clusters and a cluster classification component that further classifies the cluster as foreground or background.

98
21 Dec 2023

MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

tacju/maxtron 30 Nov 2023

To alleviate the issue, we propose to adapt the trajectory attention for both the dense pixel features and object queries, aiming to improve the short-term and long-term tracking results, respectively.

25
30 Nov 2023

Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

calvintanama/qd-driver-activity-reco 10 Nov 2023

The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification, by incorporating knowledge distillation and model quantization to balance model accuracy and computational efficiency.

8
10 Nov 2023

Differentiable Resolution Compression and Alignment for Efficient Video Classification and Retrieval

dun-research/drca 15 Sep 2023

To address these issues, we propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism, which compresses non-essential information in the early stage of the network to reduce computational costs while maintaining consistent temporal correlations.

4
15 Sep 2023

Text-to-feature diffusion for audio-visual few-shot learning

explainableml/avdiff-gfsl 7 Sep 2023

Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process.

8
07 Sep 2023

Identifying Misinformation on YouTube through Transcript Contextual Analysis with Transformer Models

christoschr97/misinf-detection-llms 22 Jul 2023

We apply the trained models to three datasets: (a) YouTube Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and (c) Fake-News dataset (a collection of articles).

0
22 Jul 2023