no code implementations • 18 Nov 2023 • David Noever, Samantha Elizabeth Miller Noever
This study explores the capabilities of multimodal large language models (LLMs) in handling challenging multistep tasks that integrate language and vision, focusing on model steerability, composability, and the application of long-term memory and context understanding.
no code implementations • 17 Aug 2023 • David Noever, Samantha Elizabeth Miller Noever
Addressing the gap in understanding visual comprehension in Large Language Models (LLMs), we designed a challenge-response study, subjecting Google Bard and GPT-Vision to 64 visual tasks, spanning categories like "Visual Situational Reasoning" and "Next Scene Prediction."