The Video-based Multimodal Summarization with Multimodal Output (VMSMO) corpus consists of 184,920 document-summary pairs, with 180,000 training pairs, 2,460 validation and test pairs. The task for this dataset is generating and appropriate textual summary of an article and choosing a proper cover frame from a video accompanying the article.
8 PAPERS • NO BENCHMARKS YET
OSAI introduces OpenTTGames - an open dataset aimed at evaluation of different computer vision tasks in Table Tennis: ball detection, semantic segmentation of humans, table and scoreboard and fast in-game events spotting.
3 PAPERS • NO BENCHMARKS YET