TinyBERT: Distilling BERT for Natural Language Understanding

23 Sep 2019Xiaoqi JiaoYichun YinLifeng ShangXin JiangXiao ChenLinlin LiFang WangQun Liu

Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Linguistic Acceptability CoLA TinyBERT Accuracy 43.3% # 17
Linguistic Acceptability CoLA Dev TinyBERT (M=6;d' =768;d'i=3072) Accuracy 54 # 1
Semantic Textual Similarity MRPC TinyBERT Accuracy 86.4% # 15
Semantic Textual Similarity MRPC Dev TinyBERT (M=6;d'=768;d'i=3072) Accuracy 86.3 # 2
Natural Language Inference MultiNLI TinyBERT Matched 82.5 # 13
Mismatched 81.8 # 11
Natural Language Inference MultiNLI Dev TinyBERT (M=6;d'=768;d'i=3072) Matched 84.5 # 1
Mismatched 84.5 # 1
Natural Language Inference QNLI TinyBERT Accuracy 87.7% # 16
Paraphrase Identification Quora Question Pairs TinyBERT F1 71.3 # 3
Natural Language Inference RTE TinyBERT Accuracy 62.9% # 17
Question Answering SQuAD1.1 dev TinyBERT (M=6;d' =768;d'i=3072) EM 79.7 # 10
F1 87.5 # 12
Question Answering SQuAD2.0 dev TinyBERT (M=6;d' =768;d'i=3072) F1 73.4 # 14
EM 69.9 # 12
Sentiment Analysis SST-2 Binary classification TinyBERT Accuracy 92.6 # 15
Semantic Textual Similarity STS Benchmark TinyBERT Pearson Correlation 0.799 # 13

Methods used in the Paper