TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Code Generation	APPS	CodeRL+CodeT5	Introductory Pass@1	6.77%	# 6
Code Generation	APPS	CodeRL+CodeT5	Interview Pass@1	1.80%	# 6
Code Generation	APPS	CodeRL+CodeT5	Competition Pass@1	0.69%	# 6
Code Generation	APPS	CodeRL+CodeT5	Introductory Pass@1000	38.10%	# 1
Code Generation	APPS	CodeRL+CodeT5	Interview Pass@1000	14.33%	# 1
Code Generation	APPS	CodeRL+CodeT5	Competition Pass@1000	15.70%	# 1
Code Generation	APPS	CodeRL+CodeT5	Competition Pass@5	2.36%	# 2
Code Generation	APPS	CodeRL+CodeT5	Interview Pass@5	4.48%	# 2
Code Generation	APPS	CodeRL+CodeT5	Introductory Pass@5	15.27%	# 2
Code Generation	APPS	CodeRL+CodeT5	Competition Pass@any	15.70%	# 1
Code Generation	APPS	CodeRL+CodeT5	Interview Pass@any	14.33%	# 2
Code Generation	APPS	CodeRL+CodeT5	Introductory Pass@any	38.10%	# 2
Code Generation	APPS	GPT-J 6B (Finetuned)	Introductory Pass@1	5.60%	# 7
Code Generation	APPS	GPT-J 6B (Finetuned)	Interview Pass@1	1.00%	# 7
Code Generation	APPS	GPT-J 6B (Finetuned)	Competition Pass@1	0.50%	# 7
Code Generation	APPS	GPT-J 6B (Finetuned)	Introductory Pass@1000	35.20%	# 2
Code Generation	APPS	GPT-J 6B (Finetuned)	Interview Pass@1000	13.15%	# 2
Code Generation	APPS	GPT-J 6B (Finetuned)	Competition Pass@1000	13.51%	# 2
Code Generation	APPS	GPT-J 6B (Finetuned)	Competition Pass@5	1.00%	# 3
Code Generation	APPS	GPT-J 6B (Finetuned)	Interview Pass@5	1.73%	# 3
Code Generation	APPS	GPT-J 6B (Finetuned)	Introductory Pass@5	9.20%	# 4
Code Generation	APPS	GPT-J 6B (Finetuned)	Competition Pass@any	13.51%	# 3
Code Generation	APPS	GPT-J 6B (Finetuned)	Interview Pass@any	13.15%	# 3
Code Generation	APPS	GPT-J 6B (Finetuned)	Introductory Pass@any	35.20%	# 3
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Introductory Pass@1	3.90%	# 9
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Interview Pass@1	0.57%	# 9
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Competition Pass@1	0.00%	# 9
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Introductory Pass@1000	27.90%	# 3
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Interview Pass@1000	9.83%	# 3
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Competition Pass@1000	11.40%	# 3
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Competition Pass@5	0.00%	# 5
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Interview Pass@5	0.80%	# 5
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Introductory Pass@5	5.50%	# 5
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Competition Pass@any	11.40%	# 4
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Interview Pass@any	9.83%	# 4
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Introductory Pass@any	27.90%	# 4
Code Generation	APPS	GPT2 1.5B (Finetuned)	Introductory Pass@1	1.30%	# 11
Code Generation	APPS	GPT2 1.5B (Finetuned)	Interview Pass@1	0.70%	# 8
Code Generation	APPS	GPT2 1.5B (Finetuned)	Competition Pass@1	0.00%	# 9
Code Generation	APPS	GPT2 1.5B (Finetuned)	Introductory Pass@1000	25.00%	# 5
Code Generation	APPS	GPT2 1.5B (Finetuned)	Interview Pass@1000	9.27%	# 4
Code Generation	APPS	GPT2 1.5B (Finetuned)	Competition Pass@1000	8.80%	# 4
Code Generation	APPS	GPT2 1.5B (Finetuned)	Competition Pass@5	0.00%	# 5
Code Generation	APPS	GPT2 1.5B (Finetuned)	Interview Pass@5	1.03%	# 4
Code Generation	APPS	GPT2 1.5B (Finetuned)	Introductory Pass@5	3.60%	# 7
Code Generation	APPS	GPT2 1.5B (Finetuned)	Competition Pass@any	8.80%	# 5
Code Generation	APPS	GPT2 1.5B (Finetuned)	Interview Pass@any	9.27%	# 6
Code Generation	APPS	GPT2 1.5B (Finetuned)	Introductory Pass@any	25.00%	# 6

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/coderl-mastering-code-generation-through/code-generation-on-apps)](https://paperswithcode.com/sota/code-generation-on-apps?p=coderl-mastering-code-generation-through)`

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

5 Jul 2022 · Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven C. H. Hoi ·

Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor. During inference, we introduce a new generation procedure with a critical sampling strategy that allows a model to automatically regenerate programs based on feedback from example unit tests and critic scores. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data. Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark.

PDF Abstract

Code

Add Remove Mark official

salesforce/coderl official

475

salesforce/codet5

2,595

Tasks

Add Remove

Code Generation

Program Synthesis

reinforcement-learning

Reinforcement Learning (RL)

Datasets

CodeSearchNet MBPP

CodeXGLUE

APPS

Results from the Paper

Edit

Ranked #1 on Code Generation on APPS

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Code Generation	APPS	CodeRL+CodeT5	Introductory Pass@1	6.77%	# 6	Compare
			Interview Pass@1	1.80%	# 6	Compare
			Competition Pass@1	0.69%	# 6	Compare
			Introductory Pass@1000	38.10%	# 1	Compare
			Interview Pass@1000	14.33%	# 1	Compare
			Competition Pass@1000	15.70%	# 1	Compare
			Competition Pass@5	2.36%	# 2	Compare
			Interview Pass@5	4.48%	# 2	Compare
			Introductory Pass@5	15.27%	# 2	Compare
			Competition Pass@any	15.70%	# 1	Compare
			Interview Pass@any	14.33%	# 2	Compare
			Introductory Pass@any	38.10%	# 2	Compare
Code Generation	APPS	GPT-J 6B (Finetuned)	Introductory Pass@1	5.60%	# 7	Compare
			Interview Pass@1	1.00%	# 7	Compare
			Competition Pass@1	0.50%	# 7	Compare
			Introductory Pass@1000	35.20%	# 2	Compare
			Interview Pass@1000	13.15%	# 2	Compare
			Competition Pass@1000	13.51%	# 2	Compare
			Competition Pass@5	1.00%	# 3	Compare
			Interview Pass@5	1.73%	# 3	Compare
			Introductory Pass@5	9.20%	# 4	Compare
			Competition Pass@any	13.51%	# 3	Compare
			Interview Pass@any	13.15%	# 3	Compare
			Introductory Pass@any	35.20%	# 3	Compare
Code Generation	APPS	GPT-Neo 2.7B (Finetuned)	Introductory Pass@1	3.90%	# 9	Compare
			Interview Pass@1	0.57%	# 9	Compare
			Competition Pass@1	0.00%	# 9	Compare
			Introductory Pass@1000	27.90%	# 3	Compare
			Interview Pass@1000	9.83%	# 3	Compare
			Competition Pass@1000	11.40%	# 3	Compare
			Competition Pass@5	0.00%	# 5	Compare
			Interview Pass@5	0.80%	# 5	Compare
			Introductory Pass@5	5.50%	# 5	Compare
			Competition Pass@any	11.40%	# 4	Compare
			Interview Pass@any	9.83%	# 4	Compare
			Introductory Pass@any	27.90%	# 4	Compare
Code Generation	APPS	GPT2 1.5B (Finetuned)	Introductory Pass@1	1.30%	# 11	Compare
			Interview Pass@1	0.70%	# 8	Compare
			Competition Pass@1	0.00%	# 9	Compare
			Introductory Pass@1000	25.00%	# 5	Compare
			Interview Pass@1000	9.27%	# 4	Compare
			Competition Pass@1000	8.80%	# 4	Compare
			Competition Pass@5	0.00%	# 5	Compare
			Interview Pass@5	1.03%	# 4	Compare
			Introductory Pass@5	3.60%	# 7	Compare
			Competition Pass@any	8.80%	# 5	Compare
			Interview Pass@any	9.27%	# 6	Compare
			Introductory Pass@any	25.00%	# 6	Compare

Methods

Add Remove

Adafactor • Attention Dropout • BPE • Dense Connections • Dropout • GELU • GLU • Inverse Square Root Schedule • Layer Normalization • Linear Layer • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • SentencePiece • Softmax • T5

Edit Social Preview

CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove