Visual Reasoning

207 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,492
3 papers
32
See all 7 libraries.

Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually

secureaiautonomylab/conditionalvlm 19 Jan 2024

This process involves addressing two key problems: (1) the reason for obfuscating unsafe images demands the platform to provide an accurate rationale that must be grounded in unsafe image-specific attributes, and (2) the unsafe regions in the image must be minimally obfuscated while still depicting the safe regions.

0
19 Jan 2024

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

shi-labs/vcoder 21 Dec 2023

Secondly, we leverage the images from COCO and outputs from off-the-shelf vision perception models to create our COCO Segmentation Text (COST) dataset for training and evaluating MLLMs on the object perception task.

216
21 Dec 2023

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

bradyfu/awesome-multimodal-large-language-models 19 Dec 2023

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

8,092
19 Dec 2023

One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems

mikomel/sal 15 Dec 2023

With the aim of developing universal learning systems in the AVR domain, we propose the unified model for solving Single-Choice Abstract visual Reasoning tasks (SCAR), capable of solving various single-choice AVR tasks, without making any a priori assumptions about the task structure, in particular the number and location of panels.

0
15 Dec 2023

BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

aifeg/benchlmm 5 Dec 2023

Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles.

80
05 Dec 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

artemisp/lavis-xinstructblip 30 Nov 2023

Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs).

36
30 Nov 2023

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

01-ai/yi 27 Nov 2023

We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning.

6,964
27 Nov 2023

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

ucsc-vlaa/vllm-safety-benchmark 27 Nov 2023

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

43
27 Nov 2023

Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method

foger3/arc_deeplearning 14 Nov 2023

This project focuses on visual analogical reasoning and applies the initial generalized mechanism used to solve verbal analogies to the visual realm.

5
14 Nov 2023

NeuSyRE: Neuro-Symbolic Visual Understanding and Reasoning Framework based on Scene Graph Enrichment

jaleedkhan/neusire Semantic Web 2023

We present a loosely-coupled neuro-symbolic visual understanding and reasoning framework that employs a DNN-based pipeline for object detection and multi-modal pairwise relationship prediction for scene graph generation and leverages common sense knowledge in heterogenous knowledge graphs to enrich scene graphs for improved downstream reasoning.

7
05 Nov 2023