Visual Reasoning

211 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Reasoning

Dataset	Best Model	Compare
Winoground	GPT-4V (CoT, pick b/w two options)	See all
NLVR2 Dev	BEiT-3	See all
NLVR2 Test	BEiT-3	See all
WinoGAViL	Humans	See all
Bongard-OpenWorld	Human	See all
VSR	LXMERT	See all
PHYRE-1B-Within	RPIN	See all
PHYRE-1B-Cross	RPIN	See all
VASR	Swin	See all
NLVR	VisualBERT	See all
IRFL: Image Recognition of Figurative Language	Humans	See all
CLEVRER	AI Core	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Visual Reasoning models and implementations

huggingface/transformers

5 papers

124,593

facebookresearch/multimodal

4 papers

1,287

salesforce/lavis

3 papers

8,691

kakao/DAFT

3 papers

See all 7 libraries.

Datasets

Subtasks

Visual Commonsense Reasoning

Latest papers

Most implemented Social Latest No code

MMCode: Evaluating Multi-Modal Code Large Language Models with Visually Rich Programming Problems

happylkx/mmcode • 15 Apr 2024

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts.

15 Apr 2024

Paper
Code

Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models

lavi-lab/visual-table • 27 Mar 2024

When visual tables serve as standalone visual representations, our model can closely match or even beat the SOTA MLLMs that are built on CLIP visual embeddings.

27 Mar 2024

Paper
Code

How Far Are We from Intelligent Visual Deductive Reasoning?

apple/ml-rpm-bench • 7 Mar 2024

Vision-Language Models (VLMs) such as GPT-4V have recently demonstrated incredible strides on diverse vision language tasks.

07 Mar 2024

Paper
Code

Slot Abstractors: Toward Scalable Abstract Visual Reasoning

slotabstractor/slotabstractor • • 6 Mar 2024

Abstract visual reasoning is a characteristically human ability, allowing the identification of relational patterns that are abstracted away from object features, and the systematic generalization of those patterns to unseen problems.

06 Mar 2024

Paper
Code

What Is Missing in Multilingual Visual Reasoning and How to Fix It

yueqis/multilingual_visual_reasoning • • 3 Mar 2024

NLP models today strive for supporting multiple languages and modalities, improving accessibility for diverse users.

03 Mar 2024

Paper
Code

Peacock: A Family of Arabic Multimodal Large Language Models and Benchmarks

ubc-nlp/peacock • 1 Mar 2024

Multimodal large language models (MLLMs) have proven effective in a wide range of tasks requiring complex reasoning and linguistic comprehension.

01 Mar 2024

Paper
Code

Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning

richard-coder-nai/disentanglement-lib-necessity • • 1 Mar 2024

This paper further investigates the necessity of disentangled representation in downstream applications.

01 Mar 2024

Paper
Code

PALO: A Polyglot Large Multimodal Model for 5B People

mbzuai-oryx/palo • • 22 Feb 2024

PALO offers visual reasoning capabilities in 10 major languages, including English, Chinese, Hindi, Spanish, French, Arabic, Bengali, Russian, Urdu, and Japanese, that span a total of ~5B people (65% of the world population).

22 Feb 2024

Paper
Code

Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach

GuillermoPuebla/object-centric-reasoning • • 20 Feb 2024

To this end, these models use several kinds of attention mechanisms to segregate the individual objects in a scene from the background and from other objects.

20 Feb 2024

Paper
Code

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

thudm/cogcom • • 6 Feb 2024

Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers.

118

06 Feb 2024

Paper
Code

Visual Reasoning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result