no code implementations • 29 May 2024 • Junjie Fei, Mahmoud Ahmed, Jian Ding, Eslam Mohamed BAKR, Mohamed Elhoseiny
Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks.
1 code implementation • ICCV 2023 • Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng
In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios.
1 code implementation • 4 May 2023 • Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao
Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.