SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Transfer learning has fundamentally changed the landscape of natural language processing (NLP) research. Many existing state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model. To address the above issue in a more principled manner, we propose a new computational framework for robust and efficient fine-tuning for pre-trained language models. Specifically, our proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the capacity of the model; 2. Bregman proximal point optimization, which is a class of trust-region methods and can prevent knowledge forgetting. Our experiments demonstrate that our proposed method achieves the state-of-the-art performance on multiple NLP benchmarks.

PDF Abstract ACL 2020 PDF ACL 2020 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Natural Language Inference AX T5 Accuracy 53.1 # 1
Natural Language Understanding GLUE MT-DNN-SMART Average 89.9 # 1
Natural Language Inference MNLI + SNLI + ANLI + FEVER SMARTRoBERTa-LARGE % Dev Accuracy 57.1 # 1
% Test Accuracy 57.1 # 1
Semantic Textual Similarity MRPC SMART Accuracy 91.3% # 6
Semantic Textual Similarity MRPC MT-DNN-SMART Accuracy 93.7% # 1
F1 91.7 # 5
Natural Language Inference MultiNLI MT-DNN-SMART Accuracy 85.7 # 1
Natural Language Inference MultiNLI T5 Matched 92.0 # 2
Mismatched 91.7 # 2
Natural Language Inference MultiNLI SMART-BERT Dev Matched 85.6 # 2
Dev Mismatched 86.0 # 2
Natural Language Inference MultiNLI SMART+BERT-BASE Accuracy 85.6 # 3
Natural Language Inference MultiNLI SMARTRoBERTa Dev Matched 91.1 # 1
Dev Mismatched 91.3 # 1
Natural Language Inference MultiNLI MT-DNN-SMARTv0 Accuracy 85.7 # 1
Natural Language Inference QNLI MT-DNN-SMART Accuracy 99.2% # 1
Natural Language Inference QNLI ALICE Accuracy 99.2% # 1
Paraphrase Identification Quora Question Pairs SMART-BERT Dev Accuracy 91.5 # 2
Dev F1 88.5 # 1
Paraphrase Identification Quora Question Pairs ALICE F1 90.7 # 1
Paraphrase Identification Quora Question Pairs FreeLB Accuracy 74.8 # 20
Dev Accuracy 92.6 # 1
Natural Language Inference RTE SMARTRoBERTa Accuracy 92.0% # 12
Natural Language Inference RTE SMART Accuracy 71.2% # 50
Natural Language Inference RTE T5-XXL 11B Accuracy 92.5% # 8
Natural Language Inference RTE SMART-BERT Accuracy 71.2% # 50
Natural Language Inference SciTail MT-DNN-SMART_10%ofTrainingData Dev Accuracy 91.3 # 2
Natural Language Inference SciTail MT-DNN-SMARTLARGEv0 % Dev Accuracy 96.6 # 1
% Test Accuracy 95.2 # 1
Natural Language Inference SciTail MT-DNN-SMART_100%ofTrainingData Dev Accuracy 96.1 # 1
Natural Language Inference SciTail MT-DNN-SMART_1%ofTrainingData Dev Accuracy 88.6 # 3
Natural Language Inference SciTail MT-DNN-SMART_0.1%ofTrainingData Dev Accuracy 82.3 # 4
Natural Language Inference SNLI MT-DNN-SMART_1%ofTrainingData Dev Accuracy 86 # 3
Natural Language Inference SNLI MT-DNN-SMART_100%ofTrainingData Dev Accuracy 91.6 # 1
Natural Language Inference SNLI MT-DNN-SMART_0.1%ofTrainingData Dev Accuracy 82.7 # 4
Natural Language Inference SNLI MT-DNN-SMARTLARGEv0 % Test Accuracy 91.7 # 7
% Dev Accuracy 92.6 # 1
Natural Language Inference SNLI MT-DNN-SMART_10%ofTrainingData Dev Accuracy 88.7 # 2
Sentiment Analysis SST-2 Binary classification SMARTRoBERTa Dev Accuracy 96.9 # 1
Sentiment Analysis SST-2 Binary classification MT-DNN Accuracy 93.6 # 38
Sentiment Analysis SST-2 Binary classification SMART-MT-DNN Dev Accuracy 96.1 # 2
Sentiment Analysis SST-2 Binary classification SMART-BERT Dev Accuracy 93.0 # 3
Sentiment Analysis SST-2 Binary classification SMART+BERT-BASE Accuracy 93 # 44
Sentiment Analysis SST-2 Binary classification MT-DNN-SMART Accuracy 97.5 # 1
Semantic Textual Similarity STS Benchmark SMART-BERT Dev Spearman Correlation 89.4 # 2
Dev Pearson Correlation 90.0 # 2
Semantic Textual Similarity STS Benchmark MT-DNN-SMART Pearson Correlation 0.929 # 1
Spearman Correlation 0.925 # 2
Semantic Textual Similarity STS Benchmark SMARTRoBERTa Dev Spearman Correlation 92.6 # 1
Dev Pearson Correlation 92.8 # 1

Methods


No methods listed for this paper. Add relevant methods here