1 code implementation • 20 May 2024 • Zhenwei Shao, Zhou Yu, Jun Yu, Xuecheng Ouyang, Lihao Zheng, Zhenbiao Gai, Mingyang Wang, Jiajun Ding
By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding.
1 code implementation • CVPR 2023 • Zhou Yu, Xuecheng Ouyang, Zhenwei Shao, Meng Wang, Jun Yu
Knowledge-based visual question answering (VQA) requires external knowledge beyond the image to answer the question.
Ranked #3 on Visual Question Answering (VQA) on A-OKVQA