1 code implementation • 27 Nov 2023 • Chancharik Mitra, Brandon Huang, Trevor Darrell, Roei Herzig
The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks.
Ranked #30 on Visual Reasoning on Winoground
2 code implementations • 12 Nov 2023 • Chancharik Mitra, Abrar Anwar, Rodolfo Corona, Dan Klein, Trevor Darrell, Jesse Thomason
When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an object's appearance can vary with camera position.