Search Results for author: Junjie Fei

Found 3 papers, 2 papers with code

Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding

no code implementations • 29 May 2024 • Junjie Fei, Mahmoud Ahmed, Jian Ding, Eslam Mohamed BAKR, Mohamed Elhoseiny

Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks.

Paper
Add Code

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

1 code implementation • ICCV 2023 • Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng

In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios.

Caption Generation Hallucination +2

134

Paper
Code

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

1 code implementation • 4 May 2023 • Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e. g.}$, looking at the specified regions or telling in a particular text style.

controllable image captioning Instruction Following

1,613

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.