no code implementations • 21 Nov 2023 • Sirui Cheng, Siyu Zhang, Jiayi Wu, Muchen Lan
Within the multimodal field, large vision-language models (LVLMs) have made significant progress due to their strong perception and reasoning capabilities in the visual and language systems.