1 code implementation • 3 Jan 2024 • Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su
The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering.