no code implementations • 4 Dec 2023 • Yanchu Guan, Dong Wang, Zhixuan Chu, Shiyu Wang, Feiyue Ni, Ruihua Song, Longfei Li, Jinjie Gu, Chenyi Zhuang
This paper proposes a novel LLM-based virtual assistant that can automatically perform multi-step operations within mobile apps based on high-level user requests.
no code implementations • 31 Dec 2022 • Xu Gu, Yuchong Sun, Feiyue Ni, ShiZhe Chen, Xihua Wang, Ruihua Song, Boyuan Li, Xiang Cao
In this paper, we propose a new task called Text synopsis to Video Storyboard (TeViS) which aims to retrieve an ordered sequence of images as the video storyboard to visualize the text synopsis.
no code implementations • 11 Jul 2022 • Jinbin Bai, Chunhui Liu, Feiyue Ni, Haofan Wang, Mengying Hu, Xiaofeng Guo, Lele Cheng
To overcome the above issue, we present a novel mechanism for learning the translation relationship from a source modality space $\mathcal{S}$ to a target modality space $\mathcal{T}$ without the need for a joint latent space, which bridges the gap between visual and textual domains.
Ranked #11 on Zero-Shot Video Retrieval on MSVD