Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

arXiv 2019 Colin RaffelNoam ShazeerAdam RobertsKatherine LeeSharan NarangMichael MatenaYanqi ZhouWei LiPeter J. Liu

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK USES EXTRA
TRAINING DATA
RESULT BENCHMARK
Question Answering BoolQ T5-11B Accuracy 91.0 # 1
Document Summarization CNN / Daily Mail T5-11B ROUGE-1 43.52 # 5
ROUGE-2 21.55 # 1
ROUGE-L 40.69 # 2
Linguistic Acceptability CoLA T5-11B Accuracy 70.8% # 1
Accuracy 70.8% # 1
Linguistic Acceptability CoLA T5-Base Accuracy 51.1% # 14
Linguistic Acceptability CoLA T5-Large Accuracy 61.2% # 10
Linguistic Acceptability CoLA T5-3B Accuracy 67.1% # 7
Linguistic Acceptability CoLA T5-Small Accuracy 41.0% # 18
Natural Language Inference CommitmentBank T5-11B F1 93.0 # 1
Question Answering COPA T5-11B Accuracy 94.8 # 1
Semantic Textual Similarity MRPC T5-11B Accuracy 90.0% # 7
F1 91.9 # 3
Semantic Textual Similarity MRPC T5-3B Accuracy 89.2% # 10
F1 92.5 # 1
Semantic Textual Similarity MRPC T5-Small Accuracy 86.6% # 14
F1 89.7 # 6
Semantic Textual Similarity MRPC T5-Base Accuracy 87.5% # 12
F1 90.7 # 5
Semantic Textual Similarity MRPC T5-Large Accuracy 89.9% # 8
F1 92.4 # 2
Natural Language Inference MultiNLI T5-Small Matched 82.4 # 14
Mismatched 82.3 # 10
Natural Language Inference MultiNLI T5-11B Matched 92.0 # 1
Natural Language Inference MultiNLI T5-Base Matched 87.1 # 9
Mismatched 86.2 # 6
Natural Language Inference MultiNLI T5-Large Matched 89.9 # 5
Mismatched 89.6 # 4
Natural Language Inference MultiNLI T5-3B Matched 91.4 # 2
Mismatched 91.2 # 2
Natural Language Inference MultiNLI T5-11B Matched 92.0 # 1
Mismatched 91.7 # 1
Question Answering MultiRC T5-11B F1a 88.2 # 1
Natural Language Inference QNLI T5-Base Accuracy 93.7% # 10
Natural Language Inference QNLI T5-3B Accuracy 96.3% # 4
Natural Language Inference QNLI T5-11B Accuracy 96.7% # 3
Natural Language Inference QNLI T5-Large Accuracy 94.8% # 7
Natural Language Inference QNLI T5-Small Accuracy 90.3% # 14
Question Answering Quora Question Pairs T5-Large Accuracy 89.9% # 6
Question Answering Quora Question Pairs T5-3B Accuracy 89.7% # 8
Question Answering Quora Question Pairs T5-Small Accuracy 88.0% # 12
Question Answering Quora Question Pairs T5-11B Accuracy 90.4% # 3
Question Answering Quora Question Pairs T5-Base Accuracy 89.4% # 9
Question Answering ReCoRD T5-11B F1 93.3 # 1
Natural Language Inference RTE T5-3B Accuracy 91.1% # 2
Natural Language Inference RTE T5-Base Accuracy 80.1% # 9
Natural Language Inference RTE T5-Large Accuracy 87.2% # 5
Natural Language Inference RTE T5-Small Accuracy 69.9% # 14
Natural Language Inference RTE T5-11B Accuracy 92.5% # 1
Question Answering SQuAD1.1 dev T5-Small EM 79.1 # 11
F1 87.24 # 13
Question Answering SQuAD1.1 dev T5-3B EM 88.53 # 4
F1 94.95 # 4
Question Answering SQuAD1.1 dev T5-11B EM 90.06 # 1
F1 95.64 # 2
Question Answering SQuAD1.1 dev T5-Base EM 85.44 # 6
F1 92.08 # 6
Question Answering SQuAD1.1 dev T5-Large EM 86.66 # 5
F1 93.79 # 5
Sentiment Analysis SST-2 Binary classification T5-3B Accuracy 97.4 # 1
Accuracy 97.4 # 1
Sentiment Analysis SST-2 Binary classification T5-Small Accuracy 91.8 # 16
Sentiment Analysis SST-2 Binary classification T5-Base Accuracy 95.2 # 8
Sentiment Analysis SST-2 Binary classification T5-11B Accuracy 97.1 # 2
Sentiment Analysis SST-2 Binary classification T5-Large Accuracy 96.3 # 6
Sentiment Analysis SST-2 Binary classification T5-Small Accuracy 91.8 # 16
Sentiment Analysis SST-2 Binary classification T5-Base Accuracy 95.2 # 8
Sentiment Analysis SST-2 Binary classification T5-11B Accuracy 97.1 # 2
Semantic Textual Similarity STS Benchmark T5-11B Pearson Correlation 0.925 # 1
Spearman Correlation 0.921 # 1
Semantic Textual Similarity STS Benchmark T5-Small Pearson Correlation 0.856 # 11
Spearman Correlation 0.85 # 7
Semantic Textual Similarity STS Benchmark T5-Base Pearson Correlation 0.894 # 9
Semantic Textual Similarity STS Benchmark T5-Large Pearson Correlation 0.899 # 8
Spearman Correlation 0.892 # 3
Semantic Textual Similarity STS Benchmark T5-3B Pearson Correlation 0.906 # 7
Spearman Correlation 0.898 # 2
Semantic Textual Similarity STS Benchmark T5-11B Pearson Correlation 0.925 # 1
Machine Translation WMT2014 English-French T5 BLEU score 43.4 # 4
Machine Translation WMT2014 English-German T5-11B BLEU score 32.1 # 2
Natural Language Inference WNLI T5-Large Accuracy 85.6% # 5
Natural Language Inference WNLI T5-Small Accuracy 69.2% # 7
Natural Language Inference WNLI T5-3B Accuracy 89.7% # 3
Natural Language Inference WNLI T5-11B Accuracy 93.2% # 1
Natural Language Inference WNLI T5-Base Accuracy 78.8% # 6
Word Sense Disambiguation Words in Context T5-11B Accuracy 76.1 # 1

Methods used in the Paper