TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Multimodal Reasoning	MATH-V	InternLM-XComposer2-VL	Accuracy	14.54	# 4
Multimodal Reasoning	MATH-V	Qwen-VL-Max	Accuracy	15.59	# 3
Multimodal Reasoning	MATH-V	Gemini Pro	Accuracy	17.66	# 2
Multimodal Reasoning	MATH-V	GPT4V	Accuracy	22.76	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/measuring-multimodal-mathematical-reasoning/multimodal-reasoning-on-math-v)](https://paperswithcode.com/sota/multimodal-reasoning-on-math-v?p=measuring-multimodal-mathematical-reasoning)`

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

22 Feb 2024 · Ke Wang, Junting Pan, Weikang Shi, Zimu Lu, Mingjie Zhan, Hongsheng Li ·

Recent advancements in Large Multimodal Models (LMMs) have shown promising results in mathematical reasoning within visual contexts, with models approaching human-level performance on existing benchmarks such as MathVista. However, we observe significant limitations in the diversity of questions and breadth of subjects covered by these benchmarks. To address this issue, we present the MATH-Vision (MATH-V) dataset, a meticulously curated collection of 3,040 high-quality mathematical problems with visual contexts sourced from real math competitions. Spanning 16 distinct mathematical disciplines and graded across 5 levels of difficulty, our dataset provides a comprehensive and diverse set of challenges for evaluating the mathematical reasoning abilities of LMMs. Through extensive experimentation, we unveil a notable performance gap between current LMMs and human performance on MATH-V, underscoring the imperative for further advancements in LMMs. Moreover, our detailed categorization allows for a thorough error analysis of LMMs, offering valuable insights to guide future research and development. The project is available at https://mathvision-cuhk.github.io

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Math

Mathematical Reasoning

Multimodal Reasoning

Datasets

Introduced in the Paper:

MATH-V

Used in the Paper:

MATH

Results from the Paper

Add Remove

Ranked #1 on Multimodal Reasoning on MATH-V (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Multimodal Reasoning	MATH-V	InternLM-XComposer2-VL	Accuracy	14.54	# 4	Compare
Multimodal Reasoning	MATH-V	Qwen-VL-Max	Accuracy	15.59	# 3	Compare
Multimodal Reasoning	MATH-V	Gemini Pro	Accuracy	17.66	# 2	Compare
Multimodal Reasoning	MATH-V	GPT4V	Accuracy	22.76	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove