TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Code Generation	HumanEval	GPT-3.5 Turbo (few-shot)	Pass@1	62.2	# 31
Code Generation	HumanEval	INTERVENOR (GPT-3.5)	Pass@1	75.6	# 16
Code Generation	HumanEval	GPT-3.5 Turbo (zero-shot)	Pass@1	60.3	# 35
Code Generation	MBPP	GPT-3.5 Turbo (few-shot)	Accuracy	45.4	# 62
Code Generation	MBPP	GPT-3.5 Turbo (0-shot)	Accuracy	39.8	# 70
Code Generation	MBPP	GPT-3.5 Turbo + INTERVENOR	Accuracy	69.8	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/intervenor-prompt-the-coding-ability-of-large/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=intervenor-prompt-the-coding-ability-of-large)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/intervenor-prompt-the-coding-ability-of-large/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=intervenor-prompt-the-coding-ability-of-large)`

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

16 Nov 2023 · Hanbin Wang, Zhenghao Liu, Shuo Wang, Ganqu Cui, Ning Ding, Zhiyuan Liu, Ge Yu ·

This paper introduces INTERVENOR (INTERactiVE chaiN Of Repair), a system designed to emulate the interactive code repair processes observed in humans, encompassing both code diagnosis and code repair. INTERVENOR prompts Large Language Models (LLMs) to play distinct roles during the code repair process, functioning as both a Code Learner and a Code Teacher. Specifically, the Code Learner is tasked with adhering to instructions to generate or repair code, while the Code Teacher is responsible for crafting a Chain-of-Repair (CoR) to serve as guidance for the Code Learner. During generating the CoR, the Code Learner needs to check the generated codes from Code Learner and reassess how to address code bugs based on error feedback received from compilers. Experimental results demonstrate that INTERVENOR surpasses baseline models, exhibiting improvements of approximately 18% and 4.3% over GPT-3.5 in code generation and code translation tasks, respectively. Our further analyses show that CoR is effective to illuminate the reasons behind bugs and outline solution plans in natural language. With the feedback of code compilers, INTERVENOR can accurately identify syntax errors and assertion errors and provide precise instructions to repair codes. All data and codes are available at https://github.com/NEUIR/INTERVENOR

PDF Abstract

Code

Add Remove Mark official

neuir/intervenor official

Tasks

Add Remove

Code Generation

Code Repair

Code Translation

Datasets

HumanEval MBPP HumanEval-X

Results from the Paper

Edit

Ranked #16 on Code Generation on HumanEval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Code Generation	HumanEval	GPT-3.5 Turbo (few-shot)	Pass@1	62.2	# 31	Compare
Code Generation	HumanEval	INTERVENOR (GPT-3.5)	Pass@1	75.6	# 16	Compare
Code Generation	HumanEval	GPT-3.5 Turbo (zero-shot)	Pass@1	60.3	# 35	Compare
Code Generation	MBPP	GPT-3.5 Turbo (few-shot)	Accuracy	45.4	# 62	Compare
Code Generation	MBPP	GPT-3.5 Turbo (0-shot)	Accuracy	39.8	# 70	Compare
Code Generation	MBPP	GPT-3.5 Turbo + INTERVENOR	Accuracy	69.8	# 19	Compare

Methods

Add Remove

Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • Fixed Factorized Attention • GELU • GPT-3 • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Strided Attention • Weight Decay

Edit Social Preview

INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove