Video Classification

172 papers with code • 11 benchmarks • 17 datasets

Video Classification is the task of producing a label that is relevant to the video given its frames. A good video level classifier is one that not only provides accurate frame labels, but also best describes the entire video given the features and the annotations of the various frames in the video. For example, a video might contain a tree in some frame, but the label that is central to the video might be something else (e.g., “hiking”). The granularity of the labels that are needed to describe the frames and the video depends on the task. Typical tasks include assigning one or more global labels to the video, and assigning one or more labels for each frame inside the video.

Source: Efficient Large Scale Video Classification

Libraries

Use these libraries to find Video Classification models and implementations

Text-to-feature diffusion for audio-visual few-shot learning

explainableml/avdiff-gfsl 7 Sep 2023

Training deep learning models for video classification from audio-visual data commonly requires immense amounts of labeled training data collected via a costly process.

8
07 Sep 2023

Identifying Misinformation on YouTube through Transcript Contextual Analysis with Transformer Models

christoschr97/misinf-detection-llms 22 Jul 2023

We apply the trained models to three datasets: (a) YouTube Vaccine-misinformation related videos, (b) YouTube Pseudoscience videos, and (c) Fake-News dataset (a collection of articles).

1
22 Jul 2023

MUVF-YOLOX: A Multi-modal Ultrasound Video Fusion Network for Renal Tumor Diagnosis

jeunyuli/muaf 15 Jul 2023

In addition, we design an object-level temporal aggregation (OTA) module that can automatically filter low-quality features and efficiently integrate temporal information from multiple frames to improve the accuracy of tumor diagnosis.

10
15 Jul 2023

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

pku-yuangroup/open-sora-plan 12 Jul 2023

The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged.

10,134
12 Jul 2023

Learning Unseen Modality Interaction

gerasmark/Reproducing-Unseen-Modality-Interaction NeurIPS 2023

Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.

1
22 Jun 2023

Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos

xinruizhou0106/csg-3dct_supp 5 Jun 2023

First, to avoid the requirement of laborious and unreliable annotation, we propose a novel and effective video classification network for weakly-supervised CSG.

2
05 Jun 2023

Malicious or Benign? Towards Effective Content Moderation for Children's Videos

syedhammadahmed/mob 24 May 2023

Online video platforms receive hundreds of hours of uploads every minute, making manual content moderation impossible.

2
24 May 2023

HateMM: A Multi-Modal Dataset for Hate Video Classification

hate-alert/hatemm 6 May 2023

Hate speech has become one of the most significant issues in modern society, having implications in both the online and the offline world.

7
06 May 2023

Verbs in Action: Improving verb understanding in video-language models

google-research/scenic ICCV 2023

Understanding verbs is crucial to modelling how people and objects interact with each other and the environment through space and time.

2,999
13 Apr 2023

SparseFormer: Sparse Visual Recognition via Limited Latent Tokens

showlab/sparseformer 7 Apr 2023

In this paper, we challenge this dense paradigm and present a new method, coined SparseFormer, to imitate human's sparse visual recognition in an end-to-end manner.

60
07 Apr 2023