Visual Reasoning

211 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,731
3 papers
32
See all 7 libraries.

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

ucsc-vlaa/vllm-safety-benchmark 27 Nov 2023

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

44
27 Nov 2023

Compositional Chain-of-Thought Prompting for Large Multimodal Models

chancharikmitra/ccot 27 Nov 2023

The combination of strong visual backbones and Large Language Model (LLM) reasoning has led to Large Multimodal Models (LMMs) becoming the current standard for a wide range of vision and language (VL) tasks.

12
27 Nov 2023

Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

foger3/arc_deeplearning 14 Nov 2023

This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm.

6
14 Nov 2023

NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment

jaleedkhan/neusire Semantic Web 2023

We present a loosely-coupled neuro-symbolic visual understanding and reasoning framework that employs a DNN-based pipeline for object detection and multi-modal pairwise relationship prediction for scene graph generation and leverages common sense knowledge in heterogenous knowledge graphs to enrich scene graphs for improved downstream reasoning.

7
05 Nov 2023

What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

rucaibox/comvint 2 Nov 2023

By conducting a comprehensive empirical study, we find that instructions focused on complex visual reasoning tasks are particularly effective in improving the performance of MLLMs on evaluation benchmarks.

18
02 Nov 2023

Weakly Supervised Semantic Parsing with Execution-based Spurious Program Filtering

klee972/exec-filter 2 Nov 2023

The problem of spurious programs is a longstanding challenge when training a semantic parser from weak supervision.

15
02 Nov 2023

ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese

kvt0012/viclevr 27 Oct 2023

Neural models for VQA have made remarkable progress on large-scale datasets, with a primary focus on resource-rich languages like English.

1
27 Oct 2023

What's Left? Concept Grounding with Logic-Enhanced Foundation Models

joyhsu0504/left 24 Oct 2023

We propose the Logic-Enhanced Foundation Model (LEFT), a unified framework that learns to ground and reason with concepts across domains with a differentiable, domain-independent, first-order logic-based program executor.

31
24 Oct 2023

Interpreting and Controlling Vision Foundation Models via Text Explanations

tonychenxyz/vit-interpret 16 Oct 2023

Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks.

10
16 Oct 2023

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

joyjayng/Bongard-OpenWorld 16 Oct 2023

We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems.

5
16 Oct 2023