Stay on topic with Classifier-Free Guidance

Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Common Sense Reasoning ARC (Easy) LLaMA 13B + CFG (0-shot) Accuracy 79.1 # 15
Common Sense Reasoning ARC (Easy) LLaMA 30B + CFG (0-shot) Accuracy 83.2 # 8
Common Sense Reasoning ARC (Easy) LLaMA 65B + CFG (0-shot) Accuracy 84.2 # 6
Common Sense Reasoning ARC (Easy) LLaMA 7B + CFG (0-shot) Accuracy 58.9 # 38
Sentence Completion HellaSwag LLaMA 13B + CFG (0-shot) Accuracy 82.1 # 35
Sentence Completion HellaSwag LLaMA 30B + CFG (0-shot) Accuracy 85.3 # 21
Sentence Completion HellaSwag LLaMA 65B + CFG (0-shot) Accuracy 86.3 # 16
Language Modelling LAMBADA LLaMA-30B+CFG (zero-shot) Accuracy 83.9 # 5
Language Modelling LAMBADA LLaMA-13B+CFG (zero-shot) Accuracy 82.2 # 8
Language Modelling LAMBADA LLaMA-65B+CFG (Zero-Shot) Accuracy 84.0 # 4
Text Generation SciQ LLaMA-65B+CFG (zero-shot) Accuracy 96.6 # 1
Text Generation SciQ LLaMA-30B+CFG (zero-shot) Accuracy 96.4 # 2
Text Generation SciQ LLaMA-13B+CFG (zero-shot) Accuracy 95.1 # 3

Methods