MultiSum is a dataset for multimodal summarization (MSMO). It consists of 17 categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. The dataset features:

1)Human-validated summaries for both video and textual content, providing superior human instruction and labels for multimodal learning.

2) Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios.

3) Benchmark tests performed on the proposed dataset to assess varied tasks and methods, including video temporal segmentation, video summarization, text summarization, and multimodal summarization.

Source: MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages