no code implementations • 31 Jan 2024 • Yihan Wu, Soumi Maiti, Yifan Peng, Wangyou Zhang, Chenda Li, Yuyue Wang, Xihua Wang, Shinji Watanabe, Ruihua Song
Existing speech language models typically utilize task-dependent prompt tokens to unify various speech tasks in a single model.
no code implementations • 31 Dec 2022 • Xu Gu, Yuchong Sun, Feiyue Ni, ShiZhe Chen, Xihua Wang, Ruihua Song, Boyuan Li, Xiang Cao
In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images as the video storyboard to visualize the text synopsis.