Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks

5 Jan 2024  ·  Haoyuan Wu, Haisheng Zheng, Zhuolun He, Bei Yu ·

Large Language Models (LLMs) have demonstrated considerable proficiency in general natural language processing (NLP) tasks. Instruction tuning, a successful paradigm, enhances the ability of LLMs to follow natural language instructions and exhibit robust generalization across a wide range of tasks. However, these models often encounter performance limitations across multiple tasks due to constrained model capacity. Expanding this capacity during the instruction tuning phase poses significant challenges. To address this issue, we introduce a novel approach, Parameter-Efficient Sparsity Crafting (PESC), which transitions dense models to sparse models using a Mixture of Experts (MoE) architecture. PESC integrates adapters into the MoE layers of sparse models, differentiating experts without altering the individual weights within these layers. This method significantly reduces computational costs and GPU memory requirements, facilitating model capacity expansion through a minimal increase in parameters via the inserted adapters. Our empirical evaluation demonstrates the effectiveness of the PESC method. Using PESC during instruction tuning, our sparse models, dubbed Camelidae outperform all other opensource sparse models and exhibit superior general capabilities compared to GPT3.5.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Common Sense Reasoning ARC (Challenge) Camelidae-8×34B Accuracy 65.2 # 16
Common Sense Reasoning ARC (Easy) Camelidae-8×34B Accuracy 86.2 # 4
Arithmetic Reasoning GSM8K Camelidae-8×34B (5-shot) Accuracy 78.3 # 70
Arithmetic Reasoning GSM8K Qwen2idae-16x14B (5-shot) Accuracy 77.8 # 71
Sentence Completion HellaSwag Camelidae-8×34B (10-shot) Accuracy 83.2 # 30
Sentence Completion HellaSwag Qwen2idae-16x14B (10-shot) Accuracy 82.3 # 34
Math Word Problem Solving MATH Qwen2idae-16x14B (4-shot) Accuracy 29.9 # 63
Math Word Problem Solving MATH Camelidae-8×34B (4-shot) Accuracy 22.6 # 74
Code Generation MBPP Camelidae-8×34B (4-shot) Accuracy 41.4 # 68
Code Generation MBPP Qwen2idae-16x14B (4-shot) Accuracy 48.6 # 52
Multi-task Language Understanding MMLU Camelidae-8×34B (5-shot) Average (%) 75.6 # 17
Multi-task Language Understanding MMLU Qwen2idae-16x14B (5-shot) Average (%) 66.7 # 39
Question Answering PIQA Camelidae-8×34B Accuracy 82.7 # 14
Common Sense Reasoning WinoGrande Camelidae-8×34B Accuracy 80.9 # 13

Methods


No methods listed for this paper. Add relevant methods here