Video Summarization

68 papers with code • 5 benchmarks • 13 datasets

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey
Image credit: iJRASET

Benchmarks

Add a Result

These leaderboards are used to track progress in Video Summarization

Dataset	Best Model	Compare
SumMe	PGL-SUM	See all
TvSum	RR-STG	See all
Query-Focused Video Summarization Dataset	EgoVLPv2	See all
Shot2Story20K	SUM-shot	See all
videoxum	VTSUM-BLIP	See all

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative Score.

mehryar72/RS-SUM • • Conference 2023

We show that the reconstruction loss of the model for a video with masked frames correlates with the representativeness of the remaining frames in the video.

11 Sep 2023

Paper
Code

UniVTG: Towards Unified Video-Language Temporal Grounding

showlab/univtg • • ICCV 2023

Most methods in this direction develop taskspecific models that are trained with type-specific labels, such as moment retrieval (time interval) and highlight detection (worthiness curve), which limits their abilities to generalize to various VTG tasks and labels.

282

31 Jul 2023

Paper
Code

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

facebookresearch/EgoVLPv2 • • ICCV 2023

Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks.

11 Jul 2023

Paper
Code

MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Jason-Qiu/MultiSum_model • • 7 Jun 2023

To address these challenges and provide a comprehensive dataset for this new direction, we have meticulously curated the \textbf{MMSum} dataset.

07 Jun 2023

Paper
Code

Joint Moment Retrieval and Highlight Detection Via Natural Language Queries

skyline-9/visionary-vids • • 8 May 2023

Video summarization has become an increasingly important task in the field of computer vision due to the vast amount of video content available on the internet.

08 May 2023

Paper
Code

Hierarchical Video-Moment Retrieval and Step-Captioning

j-min/HiREST • • CVPR 2023

Our hierarchical benchmark consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks.

29 Mar 2023

Paper
Code

SELF-VS: Self-supervised Encoding Learning For Video Summarization

BerserkerMother/Video-Summarization • • 28 Mar 2023

Empirical evaluations on correlation-based metrics, such as Kendall's $\tau$ and Spearman's $\rho$ demonstrate the superiority of our approach compared to existing state-of-the-art methods in assigning relative scores to the input frames.

28 Mar 2023

Paper
Code