TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	Q8BERT (Zafrir et al., 2019)	Accuracy	65.0	# 24
Semantic Textual Similarity	MRPC	Q8BERT (Zafrir et al., 2019)	Accuracy	89.7	# 17
Natural Language Inference	MultiNLI	Q8BERT (Zafrir et al., 2019)	Matched	85.6	# 27
Natural Language Inference	QNLI	Q8BERT (Zafrir et al., 2019)	Accuracy	93.0	# 22
Natural Language Inference	RTE	Q8BERT (Zafrir et al., 2019)	Accuracy	84.8	# 26
Sentiment Analysis	SST-2 Binary classification	Q8BERT (Zafrir et al., 2019)	Accuracy	94.7	# 31
Semantic Textual Similarity	STS Benchmark	Q8BERT (Zafrir et al., 2019)	Pearson Correlation	0.911	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q8bert-quantized-8bit-bert/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=q8bert-quantized-8bit-bert)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q8bert-quantized-8bit-bert/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=q8bert-quantized-8bit-bert)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q8bert-quantized-8bit-bert/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=q8bert-quantized-8bit-bert)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q8bert-quantized-8bit-bert/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=q8bert-quantized-8bit-bert)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q8bert-quantized-8bit-bert/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=q8bert-quantized-8bit-bert)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q8bert-quantized-8bit-bert/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=q8bert-quantized-8bit-bert)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q8bert-quantized-8bit-bert/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=q8bert-quantized-8bit-bert)`

Q8BERT: Quantized 8Bit BERT

14 Oct 2019 · Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat ·

Recently, pre-trained Transformer based language models such as BERT and GPT, have shown great improvement in many Natural Language Processing (NLP) tasks. However, these models contain a large amount of parameters. The emergence of even larger and more accurate models such as GPT2 and Megatron, suggest a trend of large pre-trained Transformer models. However, using these large models in production environments is a complex task requiring a large amount of compute, memory and power resources. In this work we show how to perform quantization-aware training during the fine-tuning phase of BERT in order to compress BERT by $4\times$ with minimal accuracy loss. Furthermore, the produced quantized model can accelerate inference speed if it is optimized for 8bit Integer supporting hardware.

PDF Abstract

Code

Add Remove Mark official

NervanaSystems/nlp-architect official

2,928

intellabs/model-compression-researc… official

132

huggingface/block_movement_pruning

mindspore-ai/models

iabd/QuantizedNMT

Tasks

Add Remove

Linguistic Acceptability

Natural Language Inference

Quantization

Semantic Textual Similarity

Sentiment Analysis

Datasets

GLUE

SST

SQuAD

MultiNLI SST-2

QNLI

MRPC

CoLA RTE STS Benchmark

Results from the Paper

Edit

Ranked #13 on Semantic Textual Similarity on STS Benchmark

Get a GitHub badge

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Linguistic Acceptability	CoLA	Q8BERT (Zafrir et al., 2019)	Accuracy	65.0	# 24	See all
Semantic Textual Similarity	MRPC	Q8BERT (Zafrir et al., 2019)	Accuracy	89.7	# 17	See all
Natural Language Inference	MultiNLI	Q8BERT (Zafrir et al., 2019)	Matched	85.6	# 27	See all
Natural Language Inference	QNLI	Q8BERT (Zafrir et al., 2019)	Accuracy	93.0	# 22	See all
Natural Language Inference	RTE	Q8BERT (Zafrir et al., 2019)	Accuracy	84.8	# 26	See all
Sentiment Analysis	SST-2 Binary classification	Q8BERT (Zafrir et al., 2019)	Accuracy	94.7	# 31	See all
Semantic Textual Similarity	STS Benchmark	Q8BERT (Zafrir et al., 2019)	Pearson Correlation	0.911	# 13	See all

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BERT • BPE • Cosine Annealing • Dense Connections • Discriminative Fine-Tuning • Dropout • GELU • GPT • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Linear Warmup With Linear Decay • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • SPEED • Transformer • Weight Decay • WordPiece

Edit Social Preview

Q8BERT: Quantized 8Bit BERT

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit