Visual Reasoning

211 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Benchmarks

Add a Result

These leaderboards are used to track progress in Visual Reasoning

Dataset	Best Model	Compare
Winoground	GPT-4V (CoT, pick b/w two options)	See all
NLVR2 Dev	BEiT-3	See all
NLVR2 Test	BEiT-3	See all
WinoGAViL	Humans	See all
Bongard-OpenWorld	Human	See all
VSR	LXMERT	See all
PHYRE-1B-Within	RPIN	See all
PHYRE-1B-Cross	RPIN	See all
VASR	Swin	See all
NLVR	VisualBERT	See all
IRFL: Image Recognition of Figurative Language	Humans	See all
CLEVRER	AI Core	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Visual Reasoning models and implementations

huggingface/transformers

5 papers

125,059

facebookresearch/multimodal

4 papers

1,294

salesforce/lavis

3 papers

8,731

kakao/DAFT

3 papers

See all 7 libraries.

Datasets

Subtasks

Visual Commonsense Reasoning

Latest papers

Most implemented Social Latest No code

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

ucsc-vlaa/vllm-safety-benchmark • • 27 Nov 2023

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

27 Nov 2023

Paper
Code

Compositional Chain-of-Thought Prompting for Large Multimodal Models

chancharikmitra/ccot • • 27 Nov 2023

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks.

27 Nov 2023

Paper
Code

Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

foger3/arc_deeplearning • • 14 Nov 2023

This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm.

14 Nov 2023

Paper
Code

NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment

jaleedkhan/neusire • • Semantic Web 2023

We present a loosely-coupled neuro-symbolic visual understanding and reasoning framework that employs a DNN-based pipeline for object detection and multi-modal pairwise relationship prediction for scene graph generation and leverages common sense knowledge in heterogenous knowledge graphs to enrich scene graphs for improved downstream reasoning.

05 Nov 2023

Paper
Code

What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

rucaibox/comvint • 2 Nov 2023

By conducting a comprehensive empirical study, we find that instructions focused on complex visual reasoning tasks are particularly effective in improving the performance of MLLMs on evaluation benchmarks.

02 Nov 2023

Paper
Code

Weakly Supervised Semantic Parsing with Execution-based Spurious Program Filtering

klee972/exec-filter • • 2 Nov 2023

The problem of spurious programs is a longstanding challenge when training a semantic parser from weak supervision.

02 Nov 2023

Paper
Code

ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese

kvt0012/viclevr • • 27 Oct 2023

Neural models for VQA have made remarkable progress on large-scale datasets, with a primary focus on resource-rich languages like English.

27 Oct 2023

Paper
Code

What's Left? Concept Grounding with Logic-Enhanced Foundation Models

joyhsu0504/left • • 24 Oct 2023

We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor.

24 Oct 2023

Paper
Code

Interpreting and Controlling Vision Foundation Models via Text Explanations

tonychenxyz/vit-interpret • • 16 Oct 2023

Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks.

16 Oct 2023

Paper
Code

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

joyjayng/Bongard-OpenWorld • • 16 Oct 2023

We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems.

16 Oct 2023

Paper
Code

Visual Reasoning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result