no code implementations • 31 May 2024 • Linli Yao, Lei LI, Shuhuai Ren, Lean Wang, Yuanxin Liu, Xu sun, Lu Hou
Specifically, we trace back the semantic relevance flow from generated language tokens to raw visual encoder patches and the intermediate outputs produced by projectors.
1 code implementation • 16 Apr 2024 • Yuchi Wang, Shuhuai Ren, Rundong Gao, Linli Yao, Qingyan Guo, Kaikai An, Jianhong Bai, Xu sun
Diffusion models have exhibited remarkable capabilities in text-to-image generation.
Ranked #8 on Image Captioning on COCO Captions (ROUGE-L metric)
1 code implementation • 4 Dec 2023 • Shuhuai Ren, Linli Yao, Shicheng Li, Xu sun, Lu Hou
This work proposes TimeChat, a time-sensitive multimodal large language model specifically designed for long video understanding.
Ranked #6 on Video Question Answering on MVBench
no code implementations • 15 May 2023 • Linli Yao, Yuanmeng Zhang, Ziheng Wang, Xinglin Hou, Tiezheng Ge, Yuning Jiang, Xu sun, Qin Jin
In this paper, we propose a novel \textbf{V}ideo \textbf{C}aption \textbf{E}diting \textbf{(VCE)} task to automatically revise an existing video description guided by multi-grained user requests.
1 code implementation • 21 Apr 2023 • Weijing Chen, Linli Yao, Qin Jin
The reason is that a large amount of images and texts in the benchmarks are coarse-grained.
1 code implementation • 17 Nov 2022 • Linli Yao, Weijing Chen, Qin Jin
Automatically generating textual descriptions for massive unlabeled images on the web can greatly benefit realistic web applications, e. g. multimodal retrieval and recommendation.
1 code implementation • 9 Feb 2022 • Linli Yao, Weiying Wang, Qin Jin
The Image Difference Captioning (IDC) task aims to describe the visual differences between two similar images with natural language.
1 code implementation • 12 Apr 2020 • Shizhe Chen, Weiying Wang, Ludan Ruan, Linli Yao, Qin Jin
The goal of the YouMakeup VQA Challenge 2020 is to provide a common benchmark for fine-grained action understanding in domain-specific videos e. g. makeup instructional videos.