no code implementations • 5 Jul 2022 • Jeiyoon Park, Kiho Kwoun, Chanhee Lee, Heuiseok Lim
Second, existing datasets for generic video summarization are relatively insufficient to train a caption generator used for extracting text information from a video and to train the multimodal feature extractors.