Video Summarization

68 papers with code • 5 benchmarks • 13 datasets

Video Summarization aims to generate a short synopsis that summarizes the video content by selecting its most informative and important parts. The produced summary is usually composed of a set of representative video frames (a.k.a. video key-frames), or video fragments (a.k.a. video key-fragments) that have been stitched in chronological order to form a shorter video. The former type of a video summary is known as video storyboard, and the latter type is known as video skim.

Source: Video Summarization Using Deep Neural Networks: A Survey
Image credit: iJRASET

VideoSAGE: Video Summarization with Graph Representation Learning

intellabs/gravi-t 14 Apr 2024

We propose a graph-based representation learning framework for video summarization.

35
14 Apr 2024

Enhancing Video Summarization with Context Awareness

hcmus-thesis-gulu/context-aware-summarization 6 Apr 2024

Despite the importance of video summarization, there is a lack of diverse and representative datasets, hindering comprehensive evaluation and benchmarking of algorithms.

1
06 Apr 2024

Cluster-based Video Summarization with Temporal Context Awareness

hcmus-thesis-gulu/tac-sum 6 Apr 2024

In this paper, we present TAC-SUM, a novel and efficient training-free approach for video summarization that addresses the limitations of existing cluster-based models by incorporating temporal context.

0
06 Apr 2024

R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

yeliudev/R2-Tuning 2 Apr 2024

Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.

16
02 Apr 2024

$R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

yeliudev/R2-Tuning 31 Mar 2024

Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries.

16
31 Mar 2024

ANIM-400K: A Large-Scale Dataset for Automated End-To-End Dubbing of Video

davidmchan/anim400k 10 Jan 2024

The Internet's wealth of content, with up to 60% published in English, starkly contrasts the global population, where only 18. 8% are English speakers, and just 5. 1% consider it their native language, leading to disparities in online information access.

84
10 Jan 2024

Shot2Story20K: A New Benchmark for Comprehensive Understanding of Multi-shot Videos

bytedance/Shot2Story 16 Dec 2023

A human need to capture both the event in every shot and associate them together to understand the story behind it.

45
16 Dec 2023

An Integrated System for Spatio-Temporal Summarization of 360-degrees Videos

idt-iti/ca-sum-360 5 Dec 2023

In this work, we present an integrated system for spatiotemporal summarization of 360-degrees videos.

4
05 Dec 2023

A Challenging Multimodal Video Summary: Simultaneously Extracting and Generating Keyframe-Caption Pairs from Video

keitokudo/multi-vidsum 4 Dec 2023

This paper proposes a practical multimodal video summarization task setting and a dataset to train and evaluate the task.

5
04 Dec 2023

Adopting Self-Supervised Learning into Unsupervised Video Summarization through Restorative Score

mehryar72/RS-SUM Conference 2023

We show that the reconstruction loss of the model for a video with masked frames correlates with the representativeness of the remaining frames in the video.

7
11 Sep 2023