Visual Reasoning

214 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,768
3 papers
32
See all 7 libraries.

Latest papers with no code

ChartBench: A Benchmark for Complex Visual Reasoning in Charts

no code yet • 26 Dec 2023

Multimodal Large Language Models (MLLMs) demonstrate impressive image understanding and generating capabilities.

GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives

no code yet • 7 Dec 2023

Learning scene graphs from natural language descriptions has proven to be a cheap and promising scheme for Scene Graph Generation (SGG).

Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

no code yet • 29 Nov 2023

Unlabeled 3D objects present an opportunity to leverage pretrained vision language models (VLMs) on a range of annotation tasks -- from describing object semantics to physical properties.

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

no code yet • 21 Nov 2023

Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a $\textbf{Re}$cursive $\textbf{Vis}$ual $\textbf{E}$xplanation algorithm.

SelfEval: Leveraging the discriminative nature of generative models for evaluation

no code yet • 17 Nov 2023

In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner.

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

no code yet • 15 Nov 2023

The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning.

Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels

no code yet • NeurIPS 2023

In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: PathFinder and Mazes.

Visual Commonsense based Heterogeneous Graph Contrastive Learning

no code yet • 11 Nov 2023

Specifically, our model contains two key components: the Commonsense-based Contrastive Learning and the Graph Relation Network.

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

no code yet • 10 Nov 2023

Motivated by the recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e. g., tokens) for both.

GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

no code yet • 8 Nov 2023

If not, we initialize a new module needed by the task and specify the inputs and outputs of this new module.