TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Reasoning	Bongard-OpenWorld	Human	2-Class Accuracy	91.0	# 1
Visual Reasoning	Bongard-OpenWorld	Otter	2-Class Accuracy	49.3	# 6
Visual Reasoning	Bongard-OpenWorld	ChatCaptioner + ChatGPT	2-Class Accuracy	49.3	# 6
Visual Reasoning	Bongard-OpenWorld	InstructBLIP + ChatGPT + Neuro-Symbolic	2-Class Accuracy	55.5	# 5
Visual Reasoning	Bongard-OpenWorld	BLIP-2 + ChatGPT (Fine-tuned)	2-Class Accuracy	63.3	# 4
Visual Reasoning	Bongard-OpenWorld	InstructBLIP + GPT-4	2-Class Accuracy	63.8	# 3
Visual Reasoning	Bongard-OpenWorld	SNAIL	2-Class Accuracy	64.0	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/bongard-openworld-few-shot-reasoning-for-free/visual-reasoning-on-bongard-openworld)](https://paperswithcode.com/sota/visual-reasoning-on-bongard-openworld?p=bongard-openworld-few-shot-reasoning-for-free)`

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

16 Oct 2023 · Rujie Wu, Xiaojian Ma, Zhenliang Zhang, Wei Wang, Qing Li, Song-Chun Zhu, Yizhou Wang ·

We introduce Bongard-OpenWorld, a new benchmark for evaluating real-world few-shot reasoning for machine vision. It originates from the classical Bongard Problems (BPs): Given two sets of images (positive and negative), the model needs to identify the set that query images belong to by inducing the visual concepts, which is exclusively depicted by images from the positive set. Our benchmark inherits the few-shot concept induction of the original BPs while adding the two novel layers of challenge: 1) open-world free-form concepts, as the visual concepts in Bongard-OpenWorld are unique compositions of terms from an open vocabulary, ranging from object categories to abstract visual attributes and commonsense factual knowledge; 2) real-world images, as opposed to the synthetic diagrams used by many counterparts. In our exploration, Bongard-OpenWorld already imposes a significant challenge to current few-shot reasoning algorithms. We further investigate to which extent the recently introduced Large Language Models (LLMs) and Vision-Language Models (VLMs) can solve our task, by directly probing VLMs, and combining VLMs and LLMs in an interactive reasoning scheme. We even conceived a neuro-symbolic reasoning approach that reconciles LLMs & VLMs with logical reasoning to emulate the human problem-solving process for Bongard Problems. However, none of these approaches manage to close the human-machine gap, as the best learner achieves 64% accuracy while human participants easily reach 91%. We hope Bongard-OpenWorld can help us better understand the limitations of current visual intelligence and facilitate future research on visual agents with stronger few-shot visual reasoning capabilities.

PDF Abstract

Code

Add Remove Mark official

joyjayng/Bongard-OpenWorld official

Tasks

Add Remove

Few-Shot Learning

Logical Reasoning

Visual Reasoning

Datasets

Introduced in the Paper:

Bongard-OpenWorld

Used in the Paper:

ImageNet mini-Imagenet

Meta-Dataset

Bongard-HOI

Results from the Paper

Edit

Ranked #1 on Visual Reasoning on Bongard-OpenWorld

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Reasoning	Bongard-OpenWorld	Human	2-Class Accuracy	91.0	# 1	Compare
Visual Reasoning	Bongard-OpenWorld	Otter	2-Class Accuracy	49.3	# 6	Compare
Visual Reasoning	Bongard-OpenWorld	ChatCaptioner + ChatGPT	2-Class Accuracy	49.3	# 6	Compare
Visual Reasoning	Bongard-OpenWorld	InstructBLIP + ChatGPT + Neuro-Symbolic	2-Class Accuracy	55.5	# 5	Compare
Visual Reasoning	Bongard-OpenWorld	BLIP-2 + ChatGPT (Fine-tuned)	2-Class Accuracy	63.3	# 4	Compare
Visual Reasoning	Bongard-OpenWorld	InstructBLIP + GPT-4	2-Class Accuracy	63.8	# 3	Compare
Visual Reasoning	Bongard-OpenWorld	SNAIL	2-Class Accuracy	64.0	# 2	Compare

Methods

Add Remove

None

Edit Social Preview

Bongard-OpenWorld: Few-Shot Reasoning for Free-form Visual Concepts in the Real World

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove