TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Arithmetic Reasoning	GSM8K	OpenChat-3.5 7B	Accuracy	77.3	# 73
Arithmetic Reasoning	GSM8K	OpenChat-3.5 7B	Parameters (Billion)	7	# 10
Code Generation	HumanEval	OpenChat-3.5 7B	Pass@1	55.5	# 39
Code Generation	HumanEval	OpenChat-3.5-1210 7B	Pass@1	68.9	# 24
Math Word Problem Solving	MATH	OpenChat-3.5 7B	Accuracy	28.6	# 66
Math Word Problem Solving	MATH	OpenChat-3.5 7B	Parameters (Billions)	7	# 58
Math Word Problem Solving	MATH	OpenChat-3.5-1210 7B	Accuracy	28.9	# 65
Math Word Problem Solving	MATH	OpenChat-3.5-1210 7B	Parameters (Billions)	7	# 58

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openchat-advancing-open-source-language/code-generation-on-humaneval)](https://paperswithcode.com/sota/code-generation-on-humaneval?p=openchat-advancing-open-source-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openchat-advancing-open-source-language/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=openchat-advancing-open-source-language)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/openchat-advancing-open-source-language/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=openchat-advancing-open-source-language)`

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

20 Sep 2023 · Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu ·

Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat and https://huggingface.co/openchat.

PDF Abstract

Code

Add Remove Mark official

imoneoi/openchat official

4,980

Tasks

Add Remove

Arithmetic Reasoning

Code Generation

Math Word Problem Solving

Datasets

GSM8K

HumanEval

MATH MT-Bench

Results from the Paper

Edit

Ranked #24 on Code Generation on HumanEval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Arithmetic Reasoning	GSM8K	OpenChat-3.5 7B	Accuracy	77.3	# 73	Compare
Arithmetic Reasoning	GSM8K	OpenChat-3.5 7B	Parameters (Billion)	7	# 10	Compare
Code Generation	HumanEval	OpenChat-3.5 7B	Pass@1	55.5	# 39	Compare
Code Generation	HumanEval	OpenChat-3.5-1210 7B	Pass@1	68.9	# 24	Compare
Math Word Problem Solving	MATH	OpenChat-3.5 7B	Accuracy	28.6	# 66	Compare
Math Word Problem Solving	MATH	OpenChat-3.5 7B	Parameters (Billions)	7	# 58	Compare
Math Word Problem Solving	MATH	OpenChat-3.5-1210 7B	Accuracy	28.9	# 65	Compare
Math Word Problem Solving	MATH	OpenChat-3.5-1210 7B	Parameters (Billions)	7	# 58	Compare

Methods

Add Remove

ALIGN • BASE • LLaMA

Edit Social Preview

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove