Visual Reasoning

215 papers with code • 12 benchmarks • 41 datasets

Ability to understand actions and reasoning associated with any visual images

Libraries

Use these libraries to find Visual Reasoning models and implementations
3 papers
8,804
3 papers
32
See all 7 libraries.

Interpreting and Controlling Vision Foundation Models via Text Explanations

tonychenxyz/vit-interpret 16 Oct 2023

Large-scale pre-trained vision foundation models, such as CLIP, have become de facto backbones for various vision tasks.

10
16 Oct 2023

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

joyjayng/Bongard-OpenWorld 16 Oct 2023

We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems.

5
16 Oct 2023

Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis

ellenzhuwang/implicit_vkood NeurIPS 2023

Deep network models are often purely inductive during both training and inference on unseen data.

1
21 Sep 2023

MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning

haozhezhao/mic 14 Sep 2023

In this paper, we address the limitation above by 1) introducing vision-language Model with Multi-Modal In-Context Learning(MMICL), a new approach to allow the VLM to deal with multi-modal inputs efficiently; 2) proposing a novel context scheme to augment the in-context learning ability of the VLM; 3) constructing the Multi-modal In-Context Learning (MIC) dataset, designed to enhance the VLM's ability to understand complex multi-modal prompts.

299
14 Sep 2023

Collecting Visually-Grounded Dialogue with A Game Of Sorts

willemsenbram/a-game-of-sorts LREC 2022

We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call "A Game Of Sorts".

3
10 Sep 2023

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

yangyi-chen/cotconsistency 8 Sep 2023

Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs.

28
08 Sep 2023

A Survey on Interpretable Cross-modal Reasoning

ZuyiZhou/Awesome-Interpretable-Cross-modal-Reasoning 5 Sep 2023

In recent years, cross-modal reasoning (CMR), the process of understanding and reasoning across different modalities, has emerged as a pivotal area with applications spanning from multimedia analysis to healthcare diagnostics.

13
05 Sep 2023

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

hypjudy/sparkles 31 Aug 2023

Our experiments validate the effectiveness of SparklesChat in understanding and reasoning across multiple images and dialogue turns.

35
31 Aug 2023

An Examination of the Compositionality of Large Generative Vision-Language Models

teleema/sade 21 Aug 2023

A challenging new task is subsequently added to evaluate the robustness of GVLMs against inherent inclination toward syntactical correctness.

19
21 Aug 2023

VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control

henryhzy/vl-pet ICCV 2023

In particular, our VL-PET-large with lightweight PET module designs significantly outperforms VL-Adapter by 2. 92% (3. 41%) and LoRA by 3. 37% (7. 03%) with BART-base (T5-base) on image-text tasks.

47
18 Aug 2023