TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Automated Theorem Proving	miniF2F-test	Lean GPT-f	Pass@1	24.6	# 7
Automated Theorem Proving	miniF2F-test	Lean GPT-f	Pass@8	29.2	# 2
Automated Theorem Proving	miniF2F-test	Lean tidy	Pass@1	18	# 11
Automated Theorem Proving	miniF2F-test	Metamath GPT-f	Pass@1	1.3	# 14
Automated Theorem Proving	miniF2F-test	Metamath GPT-f	Pass@8	1.6	# 3
Automated Theorem Proving	miniF2F-valid	Lean GPT-f	Pass@8	29.3	# 1
Automated Theorem Proving	miniF2F-valid	Lean GPT-f	Pass@1	23.9	# 1
Automated Theorem Proving	miniF2F-valid	Lean tidy	Pass@1	16.8	# 2
Automated Theorem Proving	miniF2F-valid	Metamath GPT-f	Pass@8	2	# 2
Automated Theorem Proving	miniF2F-valid	Metamath GPT-f	Pass@1	1	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/minif2f-a-cross-system-benchmark-for-formal/automated-theorem-proving-on-minif2f-valid)](https://paperswithcode.com/sota/automated-theorem-proving-on-minif2f-valid?p=minif2f-a-cross-system-benchmark-for-formal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/minif2f-a-cross-system-benchmark-for-formal/automated-theorem-proving-on-minif2f-test)](https://paperswithcode.com/sota/automated-theorem-proving-on-minif2f-test?p=minif2f-a-cross-system-benchmark-for-formal)`

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

ICLR 2022 · Kunhao Zheng, Jesse Michael Han, Stanislas Polu ·

We present miniF2F, a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, Isabelle (partially) and HOL Light (partially) and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses. We report baseline results using GPT-f, a neural theorem prover based on GPT-3 and provide an analysis of its performance. We intend for miniF2F to be a community-driven effort and hope that our benchmark will help spur advances in neural theorem proving.

PDF Abstract ICLR 2022 PDF ICLR 2022 Abstract

Code

Add Remove Mark official

openai/minif2f official

256

facebookresearch/minif2f

rah4927/lean-dojo-mew

Tasks

Add Remove

Automated Theorem Proving

Datasets

Introduced in the Paper:

MiniF2F

Used in the Paper:

MATH

NaturalProofs

Results from the Paper

Edit

Ranked #1 on Automated Theorem Proving on miniF2F-valid (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Automated Theorem Proving	miniF2F-test	Lean GPT-f	Pass@1	24.6	# 7	Compare
Automated Theorem Proving	miniF2F-test	Lean GPT-f	Pass@8	29.2	# 2	Compare
Automated Theorem Proving	miniF2F-test	Lean tidy	Pass@1	18	# 11	Compare
Automated Theorem Proving	miniF2F-test	Metamath GPT-f	Pass@1	1.3	# 14	Compare
Automated Theorem Proving	miniF2F-test	Metamath GPT-f	Pass@8	1.6	# 3	Compare
Automated Theorem Proving	miniF2F-valid	Lean GPT-f	Pass@8	29.3	# 1	Compare
Automated Theorem Proving	miniF2F-valid	Lean GPT-f	Pass@1	23.9	# 1	Compare
Automated Theorem Proving	miniF2F-valid	Lean tidy	Pass@1	16.8	# 2	Compare
Automated Theorem Proving	miniF2F-valid	Metamath GPT-f	Pass@8	2	# 2	Compare
Automated Theorem Proving	miniF2F-valid	Metamath GPT-f	Pass@1	1	# 3	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

MiniF2F: a cross-system benchmark for formal Olympiad-level mathematics

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove