CodeT5+: Open Code Large Language Models for Code Understanding and Generation

13 May 2023  ยท  Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D. Q. Bui, Junnan Li, Steven C. H. Hoi ยท

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Code Search CodeSearchNet CodeT5+ 770M Overall 77.4 # 3
Go 92.7 # 3
Ruby 78.0 # 3
Python 75.8 # 5
Java 76.2 # 4
JS 71.3 # 4
PHP 70.1 # 5
Code Search CodeSearchNet CodeT5+ 220M Overall 77.1 # 5
Go 92.4 # 4
Ruby 77.7 # 4
Python 75.6 # 6
Java 76.1 # 5
JS 70..8 # 6
PHP 69.8 # 6
Code Search CodeXGLUE - AdvTest CodeT5+ 220M MRR 43.3 # 2
Code Search CodeXGLUE - AdvTest CodeT5+ 770M MRR 44.7 # 1
Code Summarization CodeXGLUE - CodeSearchNet CodeT5+ 770M Ruby 15.63 # 2
Javascript 17.93 # 1
Python 20.47 # 2
Java 20.83 # 2
PHP 26.39 # 3
Code Summarization CodeXGLUE - CodeSearchNet CodeT5+ 220M Ruby 15.51 # 3
Javascript 16.27 # 2
Go 19.60 # 2
Python 20.16 # 4
Java 20.53 # 3
PHP 26.78 # 1
Code Completion CodeXGLUE - Github Java Corpus CodeT5+ 220M EM (line-level) 35.17 # 2
Edit Sim (line-level) 69.48 # 2
Code Completion CodeXGLUE - Github Java Corpus CodeT5+ 770M EM (line-level) 37.90 # 1
Edit Sim (line-level) 72.25 # 1
Code Completion CodeXGLUE - PY150 CodeT5+ 770M EM (line-level) 44.86 # 1
Edit Sim (line-level) 74.22 # 1
Code Completion CodeXGLUE - PY150 CodeT5+ 220M EM (line-level) 43.42 # 2
Edit Sim (line-level) 73.69 # 2
Arithmetic Reasoning GSM8K CodeT5+ Accuracy 73.8 # 84
Parameters (Billion) 0.77 # 6
Code Generation HumanEval CodeT5+ 220M (zero-shot) Pass@1 12.0 # 118
Code Generation HumanEval CodeT5+ 770M (zero-shot) Pass@1 15.5 # 110
Code Generation HumanEval CodeT5+ 2B (zero-shot) Pass@1 24.2 # 90
Code Generation HumanEval CodeT5+ 6B (zero-shot) Pass@1 28.0 # 86
Code Generation HumanEval CodeT5+ 16B (zero-shot) Pass@1 30.9 # 76
Code Generation HumanEval CodeT5+ 16B (CodeT) Pass@1 38.5 # 63
Code Generation HumanEval InstructCodeT5+ 16B (CodeT) Pass@1 42.9 # 57
Code Generation HumanEval InstructCodeT5+ 16B (zero-shot) Pass@1 35.0 # 70

Methods