ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

ICLR 2020 Zhenzhong LanMingda ChenSebastian GoodmanKevin GimpelPiyush SharmaRadu Soricut

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times... (read more)

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Linguistic Acceptability CoLA ALBERT Accuracy 69.1% # 2
Semantic Textual Similarity MRPC ALBERT Accuracy 93.4% # 1
Natural Language Inference MultiNLI ALBERT Matched 91.3 # 3
Natural Language Inference QNLI ALBERT Accuracy 99.2% # 1
Question Answering Quora Question Pairs ALBERT Accuracy 90.5% # 2
Natural Language Inference RTE ALBERT Accuracy 89.2% # 3
Question Answering SQuAD2.0 ALBERT (ensemble model) EM 89.731 # 12
F1 92.215 # 13
Question Answering SQuAD2.0 ALBERT (single model) EM 88.107 # 38
F1 90.902 # 41
Question Answering SQuAD2.0 dev ALBERT xlarge F1 85.9 # 7
EM 83.1 # 5
Question Answering SQuAD2.0 dev ALBERT xxlarge F1 88.1 # 4
EM 85.1 # 4
Question Answering SQuAD2.0 dev ALBERT large F1 82.1 # 9
EM 79 # 7
Question Answering SQuAD2.0 dev ALBERT base F1 79.1 # 11
EM 76.1 # 9
Sentiment Analysis SST-2 Binary classification ALBERT Accuracy 97.1 # 2
Semantic Textual Similarity STS Benchmark ALBERT Pearson Correlation 0.925 # 1
Natural Language Inference WNLI ALBERT Accuracy 91.8% # 2

Methods used in the Paper