Visual Reasoning

214 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Reasoning

Dataset	Best Model	Compare
Winoground	GPT-4V (CoT, pick b/w two options)	See all
NLVR2 Dev	BEiT-3	See all
NLVR2 Test	BEiT-3	See all
WinoGAViL	Humans	See all
Bongard-OpenWorld	Human	See all
VSR	LXMERT	See all
PHYRE-1B-Within	RPIN	See all
PHYRE-1B-Cross	RPIN	See all
VASR	Swin	See all
NLVR	VisualBERT	See all
IRFL: Image Recognition of Figurative Language	Humans	See all
CLEVRER	AI Core	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Visual Reasoning models and implementations

huggingface/transformers

5 papers

125,385

facebookresearch/multimodal

4 papers

1,301

salesforce/lavis

3 papers

8,768

kakao/DAFT

3 papers

See all 7 libraries.

Datasets

Subtasks

Visual Commonsense Reasoning

Latest papers with no code

Most implemented Social Latest No code

ChartBench: A Benchmark for Complex Visual Reasoning in Charts

no code yet • 26 Dec 2023

Multimodal Large Language Models (MLLMs) demonstrate impressive image understanding and generating capabilities.

Paper
Add Code

GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives

no code yet • 7 Dec 2023

Learning scene graphs from natural language descriptions has proven to be a cheap and promising scheme for Scene Graph Generation (SGG).

Paper
Add Code

Evaluating VLMs for Score-Based, Multi-Probe Annotation of 3D Objects

no code yet • 29 Nov 2023

Unlabeled 3D objects present an opportunity to leverage pretrained vision language models (VLMs) on a range of annotation tasks -- from describing object semantics to physical properties.

Paper
Add Code

From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation

no code yet • 21 Nov 2023

Addressing the challenge of adapting pre-trained vision-language models for generating insightful explanations for visual reasoning tasks with limited annotations, we present ReVisE: a $\textbf{Re}$cursive $\textbf{Vis}$ual $\textbf{E}$xplanation algorithm.

Paper
Add Code

SelfEval: Leveraging the discriminative nature of generative models for evaluation

no code yet • 17 Nov 2023

In this work, we show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities in a completely automated manner.

Paper
Add Code

The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task

no code yet • 15 Nov 2023

The study explores the effectiveness of the Chain-of-Thought approach, known for its proficiency in language tasks by breaking them down into sub-tasks and intermediate steps, in improving vision-language tasks that demand sophisticated perception and reasoning.

Paper
Add Code

Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels

no code yet • NeurIPS 2023

In this study, we investigate a critical functional role of such adaptive processing using recurrent neural networks: to dynamically scale computational resources conditional on input requirements that allow for zero-shot generalization to novel difficulty levels not seen during training using two challenging visual reasoning tasks: PathFinder and Mazes.

Paper
Add Code

Visual Commonsense based Heterogeneous Graph Contrastive Learning

no code yet • 11 Nov 2023

Specifically, our model contains two key components: the Commonsense-based Contrastive Learning and the Graph Relation Network.

Paper
Add Code

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

no code yet • 10 Nov 2023

Motivated by the recent success of multi-task transformers for visual recognition and language understanding, we propose a unified neural architecture for visual recognition and reasoning with a generic interface (e. g., tokens) for both.

Paper
Add Code

GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

no code yet • 8 Nov 2023

If not, we initialize a new module needed by the task and specify the inputs and outputs of this new module.

Paper
Add Code

Visual Reasoning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result