MultiSum is a dataset for multimodal summarization (MSMO). It consists of 17 categories and 170 subcategories to encapsulate a diverse array of real-world scenarios. The dataset features:
1)Human-validated summaries for both video and textual content, providing superior human instruction and labels for multimodal learning.
2) Comprehensively and meticulously arranged categorization, spanning 17 principal categories and 170 subcategories to encapsulate a diverse array of real-world scenarios.
3) Benchmark tests performed on the proposed dataset to assess varied tasks and methods, including video temporal segmentation, video summarization, text summarization, and multimodal summarization.
Source: MultiSum: A Dataset for Multimodal Summarization and Thumbnail Generation of VideosPaper | Code | Results | Date | Stars |
---|