Toward Efficient Language Model Pretraining and Downstream Adaptation via Self-Evolution: A Case Study on SuperGLUE

This technical report briefly describes our JDExplore d-team's Vega v2 submission on the SuperGLUE leaderboard. SuperGLUE is more challenging than the widely used general language understanding evaluation (GLUE) benchmark, containing eight difficult language understanding tasks, including question answering, natural language inference, word sense disambiguation, coreference resolution, and reasoning. [Method] Instead of arbitrarily increasing the size of a pretrained language model (PLM), our aim is to 1) fully extract knowledge from the input pretraining data given a certain parameter budget, e.g., 6B, and 2) effectively transfer this knowledge to downstream tasks. To achieve goal 1), we propose self-evolution learning for PLMs to wisely predict the informative tokens that should be masked, and supervise the masked language modeling (MLM) process with rectified smooth labels. For goal 2), we leverage the prompt transfer technique to improve the low-resource tasks by transferring the knowledge from the foundation model and related downstream tasks to the target task. [Results] According to our submission record (Oct. 2022), with our optimized pretraining and fine-tuning strategies, our 6B Vega method achieved new state-of-the-art performance on 4/8 tasks, sitting atop the SuperGLUE leaderboard on Oct. 8, 2022, with an average score of 91.3.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Question Answering BoolQ Turing NLR v5 XXL 5.4B (fine-tuned) Accuracy 92 # 3
Question Answering BoolQ Vega v2 6B (fine-tuned) Accuracy 90.5 # 7
Natural Language Inference CommitmentBank Turing NLR v5 XXL 5.4B (fine-tuned) F1 95.9 # 3
Accuracy 97.6 # 5
Natural Language Inference CommitmentBank Vega v2 6B (KD-based prompt transfer) F1 98.6 # 2
Accuracy 99.2 # 2
Question Answering COPA Vega v2 6B (KD-based prompt transfer) Accuracy 99.4 # 2
Question Answering COPA Turing NLR v5 XXL 5.4B (fine-tuned) Accuracy 98.2 # 6
Question Answering MultiRC Vega v2 6B (fine-tuned) F1 88.2 # 4
EM 62.4 # 5
Question Answering MultiRC Turing NLR v5 XXL 5.4B (fine-tuned) F1 88.4 # 3
EM 63 # 4
Common Sense Reasoning ReCoRD Vega v2 6B (fine-tuned) F1 94.4 # 4
EM 93.9 # 5
Common Sense Reasoning ReCoRD Turing NLR v5 XXL 5.4B (fine-tuned) F1 96.4 # 1
EM 95.9 # 1
Natural Language Inference RTE Vega v2 6B (KD-based prompt transfer) Accuracy 96% # 1
Natural Language Inference RTE Turing NLR v5 XXL 5.4B (fine-tuned) Accuracy 94.1% # 3
Coreference Resolution Winograd Schema Challenge Vega v2 6B (KD-based prompt transfer) Accuracy 98.6 # 2
Coreference Resolution Winograd Schema Challenge Turing NLR v5 XXL 5.4B (fine-tuned) Accuracy 97.3 # 4
Word Sense Disambiguation Words in Context Vega v2 6B (fine-tuned) Accuracy 77.4 # 5
Word Sense Disambiguation Words in Context Turing NLR v5 XXL 5.4B (fine-tuned) Accuracy 77.1 # 7

Methods


No methods listed for this paper. Add relevant methods here