Visual Reasoning

212 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,674
3 papers
32
See all 7 libraries.

Latest papers with no code

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

no code yet • 16 Apr 2024

Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning ability to understand images and videos, have received widespread attention in the autonomous driving domain, which significantly advances the development of interpretable end-to-end autonomous driving.

Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

no code yet • 9 Apr 2024

In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong.

Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models

no code yet • 28 Mar 2024

The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent capabilities in instruction following and reasoning, has greatly advanced the field of visual reasoning.

PropTest: Automatic Property Testing for Improved Visual Programming

no code yet • 25 Mar 2024

Visual Programming has emerged as an alternative to end-to-end black-box visual reasoning models.

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding

no code yet • 21 Mar 2024

In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs.

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

no code yet • 19 Mar 2024

Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.

Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning

no code yet • 10 Mar 2024

Several approaches aim to efficiently adapt VLP models to downstream tasks with limited supervision, aiming to leverage the acquired knowledge from VLP models.

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

no code yet • 5 Mar 2024

Misinformation is a prevalent societal issue due to its potential high risks.

VISREAS: Complex Visual Reasoning with Unanswerable Questions

no code yet • 23 Feb 2024

The unique feature of this task, validating question answerability with respect to an image before answering, and the poor performance of state-of-the-art models inspired the design of a new modular baseline, LOGIC2VISION that reasons by producing and executing pseudocode without any external modules to generate the answer.

Visual In-Context Learning for Large Vision-Language Models

no code yet • 18 Feb 2024

In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities.