TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Common Sense Reasoning	ARC (Challenge)	Camelidae-8×34B	Accuracy	65.2	# 16
Common Sense Reasoning	ARC (Easy)	Camelidae-8×34B	Accuracy	86.2	# 4
Arithmetic Reasoning	GSM8K	Camelidae-8×34B (5-shot)	Accuracy	78.3	# 70
Arithmetic Reasoning	GSM8K	Qwen2idae-16x14B (5-shot)	Accuracy	77.8	# 71
Sentence Completion	HellaSwag	Camelidae-8×34B (10-shot)	Accuracy	83.2	# 30
Sentence Completion	HellaSwag	Qwen2idae-16x14B (10-shot)	Accuracy	82.3	# 34
Math Word Problem Solving	MATH	Qwen2idae-16x14B (4-shot)	Accuracy	29.9	# 63
Math Word Problem Solving	MATH	Camelidae-8×34B (4-shot)	Accuracy	22.6	# 74
Code Generation	MBPP	Camelidae-8×34B (4-shot)	Accuracy	41.4	# 68
Code Generation	MBPP	Qwen2idae-16x14B (4-shot)	Accuracy	48.6	# 52
Multi-task Language Understanding	MMLU	Camelidae-8×34B (5-shot)	Average (%)	75.6	# 17
Multi-task Language Understanding	MMLU	Qwen2idae-16x14B (5-shot)	Average (%)	66.7	# 39
Question Answering	PIQA	Camelidae-8×34B	Accuracy	82.7	# 14
Common Sense Reasoning	WinoGrande	Camelidae-8×34B	Accuracy	80.9	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/common-sense-reasoning-on-arc-easy)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-easy?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/common-sense-reasoning-on-winogrande)](https://paperswithcode.com/sota/common-sense-reasoning-on-winogrande?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/question-answering-on-piqa)](https://paperswithcode.com/sota/question-answering-on-piqa?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/common-sense-reasoning-on-arc-challenge)](https://paperswithcode.com/sota/common-sense-reasoning-on-arc-challenge?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/multi-task-language-understanding-on-mmlu)](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/sentence-completion-on-hellaswag)](https://paperswithcode.com/sota/sentence-completion-on-hellaswag?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/code-generation-on-mbpp)](https://paperswithcode.com/sota/code-generation-on-mbpp?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/math-word-problem-solving-on-math)](https://paperswithcode.com/sota/math-word-problem-solving-on-math?p=parameter-efficient-sparsity-crafting-from)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/parameter-efficient-sparsity-crafting-from/arithmetic-reasoning-on-gsm8k)](https://paperswithcode.com/sota/arithmetic-reasoning-on-gsm8k?p=parameter-efficient-sparsity-crafting-from)`

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

5 Jan 2024 · Haoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu ·

Large Language Models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across a wide range of tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce a novel approach, Parameter-Efficient Sparsity Crafting (PESC), which transitions dense models to sparse models using a Mixture of Experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal increase in parameters via the inserted adapters. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our sparse models, dubbed Camelidae outperform all other opensource sparse models and exhibit superior general capabilities compared to GPT3.5.

PDF Abstract

Code

Add Remove Mark official

wuhy68/parameter-efficient-moe official

110

Tasks

Add Remove

Arithmetic Reasoning

Code Generation

Common Sense Reasoning

Math Word Problem Solving

Multi-task Language Understanding

Question Answering

Sentence Completion

Datasets

Natural Questions

MMLU

GSM8K

TriviaQA

HumanEval

HellaSwag

MATH

PIQA

WinoGrande MBPP

ARC (AI2 Reasoning Challenge)

Results from the Paper

Add Remove

Ranked #4 on Common Sense Reasoning on ARC (Easy)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Common Sense Reasoning	ARC (Challenge)	Camelidae-8×34B	Accuracy	65.2	# 16	Compare
Common Sense Reasoning	ARC (Easy)	Camelidae-8×34B	Accuracy	86.2	# 4	Compare
Arithmetic Reasoning	GSM8K	Camelidae-8×34B (5-shot)	Accuracy	78.3	# 70	Compare
Arithmetic Reasoning	GSM8K	Qwen2idae-16x14B (5-shot)	Accuracy	77.8	# 71	Compare
Sentence Completion	HellaSwag	Camelidae-8×34B (10-shot)	Accuracy	83.2	# 30	Compare
Sentence Completion	HellaSwag	Qwen2idae-16x14B (10-shot)	Accuracy	82.3	# 34	Compare
Math Word Problem Solving	MATH	Qwen2idae-16x14B (4-shot)	Accuracy	29.9	# 63	Compare
Math Word Problem Solving	MATH	Camelidae-8×34B (4-shot)	Accuracy	22.6	# 74	Compare
Code Generation	MBPP	Camelidae-8×34B (4-shot)	Accuracy	41.4	# 68	Compare
Code Generation	MBPP	Qwen2idae-16x14B (4-shot)	Accuracy	48.6	# 52	Compare
Multi-task Language Understanding	MMLU	Camelidae-8×34B (5-shot)	Average (%)	75.6	# 17	Compare
Multi-task Language Understanding	MMLU	Qwen2idae-16x14B (5-shot)	Average (%)	66.7	# 39	Compare
Question Answering	PIQA	Camelidae-8×34B	Accuracy	82.7	# 14	Compare
Common Sense Reasoning	WinoGrande	Camelidae-8×34B	Accuracy	80.9	# 13	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove