Visual Reasoning

212 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Reasoning

Dataset	Best Model	Compare
Winoground	GPT-4V (CoT, pick b/w two options)	See all
NLVR2 Dev	BEiT-3	See all
NLVR2 Test	BEiT-3	See all
WinoGAViL	Humans	See all
Bongard-OpenWorld	Human	See all
VSR	LXMERT	See all
PHYRE-1B-Within	RPIN	See all
PHYRE-1B-Cross	RPIN	See all
VASR	Swin	See all
NLVR	VisualBERT	See all
IRFL: Image Recognition of Figurative Language	Humans	See all
CLEVRER	AI Core	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Visual Reasoning models and implementations

huggingface/transformers

5 papers

124,527

facebookresearch/multimodal

4 papers

1,286

salesforce/lavis

3 papers

8,674

kakao/DAFT

3 papers

See all 7 libraries.

Datasets

Subtasks

Visual Commonsense Reasoning

Latest papers with no code

Most implemented Social Latest No code

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

no code yet • 16 Apr 2024

Large Vision-Language Models (LVLMs), due to the remarkable visual reasoning ability to understand images and videos, have received widespread attention in the autonomous driving domain, which significantly advances the development of interpretable end-to-end autonomous driving.

Paper
Add Code

Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

no code yet • 9 Apr 2024

In this note, we revisit the IMO-AG-30 Challenge introduced with AlphaGeometry, and find that Wu's method is surprisingly strong.

Paper
Add Code

Plug-and-Play Grounding of Reasoning in Multimodal Large Language Models

no code yet • 28 Mar 2024

The surge of Multimodal Large Language Models (MLLMs), given their prominent emergent capabilities in instruction following and reasoning, has greatly advanced the field of visual reasoning.

Paper
Add Code

PropTest: Automatic Property Testing for Improved Visual Programming

no code yet • 25 Mar 2024

Visual Programming has emerged as an alternative to end-to-end black-box visual reasoning models.

Paper
Add Code

VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding

no code yet • 21 Mar 2024

In contrast, this paper introduces a Video Understanding and Reasoning Framework (VURF) based on the reasoning power of LLMs.

Paper
Add Code

HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning

no code yet • 19 Mar 2024

Recent advances in visual reasoning (VR), particularly with the aid of Large Vision-Language Models (VLMs), show promise but require access to large-scale datasets and face challenges such as high computational costs and limited generalization capabilities.

Paper
Add Code

Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning

no code yet • 10 Mar 2024

Several approaches aim to efficiently adapt VLP models to downstream tasks with limited supervision, aiming to leverage the acquired knowledge from VLP models.

Paper
Add Code

SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection

no code yet • 5 Mar 2024

Misinformation is a prevalent societal issue due to its potential high risks.

Paper
Add Code

VISREAS: Complex Visual Reasoning with Unanswerable Questions

no code yet • 23 Feb 2024

The unique feature of this task, validating question answerability with respect to an image before answering, and the poor performance of state-of-the-art models inspired the design of a new modular baseline, LOGIC2VISION that reasons by producing and executing pseudocode without any external modules to generate the answer.

Paper
Add Code

Visual In-Context Learning for Large Vision-Language Models

no code yet • 18 Feb 2024

In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities.

Paper
Add Code

Visual Reasoning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result