Search Results for author: Boyu Gou

Found 1 papers, 1 papers with code

GPT-4V(ision) is a Generalist Web Agent, if Grounded

1 code implementation3 Jan 2024 Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su

The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering.

Image Captioning Question Answering +1

Cannot find the paper you are looking for? You can Submit a new open access paper.