SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot

2 Jan 2023  ยท  Elias Frantar, Dan Alistarh ยท

We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.

PDF Abstract

Results from the Paper


 Ranked #1 on Language Modelling on WikiText-2 (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Common Sense Reasoning ARC (Challenge) OPT-175B Accuracy 43.94 # 38
Common Sense Reasoning ARC (Challenge) SparseGPT (175B, 2:4 Sparsity) Accuracy 38.99 # 42
Common Sense Reasoning ARC (Challenge) SparseGPT (175B, 4:8 Sparsity) Accuracy 39.85 # 41
Common Sense Reasoning ARC (Challenge) SparseGPT (175B, 50% Sparsity) Accuracy 41.3 # 40
Common Sense Reasoning ARC (Challenge) OPT-175B (50% Sparsity) Accuracy 25.6 # 48
Common Sense Reasoning ARC (Easy) OPT-175B Accuracy 71.04 # 27
Common Sense Reasoning ARC (Easy) SparseGPT (175B, 4:8 Sparsity) Accuracy 68.35 # 35
Common Sense Reasoning ARC (Easy) OPT 175B (50% Sparsity) Accuracy 28.03 # 43
Common Sense Reasoning ARC (Easy) SparseGPT 175B (2:4 sparsity) Accuracy 67.08 # 37
Common Sense Reasoning ARC (Easy) SparseGPT 175B (50% sparsity) Accuracy 69.65 # 32
Language Modelling LAMBADA OPT-175B Accuracy 75.59 # 19
Language Modelling LAMBADA OPT-175B (50% Sparsity) Accuracy 0.02 # 33
Language Modelling LAMBADA SparseGPT (175B, 50% Sparsity) Accuracy 76.51 # 17
Language Modelling LAMBADA SparseGPT (175B, 4:8 Sparsity) Accuracy 78.77 # 14
Language Modelling LAMBADA SparseGPT (175B, 2:4 Sparsity) Accuracy 79.47 # 13
Question Answering PIQA SparseGPT 175B (2:4 Sparsity) Accuracy 79.54 # 30
Question Answering PIQA SparseGPT 175B (4:8 Sparsity) Accuracy 79.54 # 30
Question Answering PIQA SparseGPT 175B (50% Sparsity) Accuracy 80.63 # 25
Question Answering PIQA OPT-175B (50% Sparsity) Accuracy 54.73 # 61
Question Answering PIQA OPT-175B Accuracy 81.07 # 23
Question Answering StoryCloze OPT-175B (50% Sparsity) Accuracy 47.10 # 23
Question Answering StoryCloze SparseGPT (175B, 50% Sparsity) Accuracy 78.87 # 11
Question Answering StoryCloze SparseGPT (175B, 4:8 Sparsity) Accuracy 77.02 # 14
Question Answering StoryCloze SparseGPT (175B, 2:4 Sparsity) Accuracy 76.19 # 16
Question Answering StoryCloze OPT-175B Accuracy 79.82 # 10
Language Modelling WikiText-2 OPT-175B Test perplexity 8.34 # 2
Language Modelling WikiText-2 SparseGPT (175B, 50% Sparsity) Test perplexity 8.21 # 1
Language Modelling WikiText-2 OPT-175B (50% Sparsity) Test perplexity 234.77 # 38
Language Modelling WikiText-2 SparseGPT (175B, 4:8 Sparsity) Test perplexity 8.45 # 3
Language Modelling WikiText-2 SparseGPT (175B, 2:4 Sparsity) Test perplexity 8.73 # 4

Methods