1 code implementation • 27 Nov 2023 • Chancharik Mitra, Brandon Huang, Trevor Darrell, Roei Herzig
The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks.
Ranked #30 on Visual Reasoning on Winoground