1 code implementation • 16 Apr 2024 • Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu sun
Diffusion models have exhibited remarkable capabilities in text-to-image generation.
1 code implementation • 4 Dec 2023 • Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou
This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.
no code implementations • 15 May 2023 • Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Qin Jin
In this paper, we propose a novel Video Description Editing (VDEdit) task to automatically revise an existing video description guided by flexible user requests.
1 code implementation • 21 Apr 2023 • Weijing Chen, Linli Yao, Qin Jin
The reason is that a large amount of images and texts in the benchmarks are coarse-grained.
1 code implementation • 17 Nov 2022 • Linli Yao, Weijing Chen, Qin Jin
Automatically generating textual descriptions for massive unlabeled images on the web can greatly benefit realistic web applications, e. g. multimodal retrieval and recommendation.
1 code implementation • 9 Feb 2022 • Linli Yao, Weiying Wang, Qin Jin
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language.
1 code implementation • 12 Apr 2020 • Shizhe Chen, Weiying Wang, Ludan Ruan, Linli Yao, Qin Jin
The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e. g. makeup instructional videos.