TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Abstractive Text Summarization	CNN / Daily Mail	BART+R3F	ROUGE-1	44.38	# 14
Abstractive Text Summarization	CNN / Daily Mail	BART+R3F	ROUGE-2	21.53	# 10
Abstractive Text Summarization	CNN / Daily Mail	BART+R3F	ROUGE-L	41.17	# 18
Text Summarization	GigaWord	BART-RXF	ROUGE-1	40.45	# 2
Text Summarization	GigaWord	BART-RXF	ROUGE-2	20.69	# 2
Text Summarization	GigaWord	BART-RXF	ROUGE-L	36.56	# 12
Text Summarization	Reddit TIFU	BART+R3F	ROUGE-1	30.31	# 2
Text Summarization	Reddit TIFU	BART+R3F	ROUGE-2	10.98	# 3
Text Summarization	Reddit TIFU	BART+R3F	ROUGE-L	24.74	# 3
Cross-Lingual Natural Language Inference	XNLI Zero-Shot English-to-French	XLM-R R4F	Accuracy	84.7%	# 1
Cross-Lingual Natural Language Inference	XNLI Zero-Shot English-to-German	XLM-R R4F	Accuracy	84.2%	# 1
Cross-Lingual Natural Language Inference	XNLI Zero-Shot English-to-Spanish	XLM-R R4F	Accuracy	85.2%	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/better-fine-tuning-by-reducing/cross-lingual-natural-language-inference-on)](https://paperswithcode.com/sota/cross-lingual-natural-language-inference-on?p=better-fine-tuning-by-reducing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/better-fine-tuning-by-reducing/cross-lingual-natural-language-inference-on-3)](https://paperswithcode.com/sota/cross-lingual-natural-language-inference-on-3?p=better-fine-tuning-by-reducing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/better-fine-tuning-by-reducing/cross-lingual-natural-language-inference-on-1)](https://paperswithcode.com/sota/cross-lingual-natural-language-inference-on-1?p=better-fine-tuning-by-reducing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/better-fine-tuning-by-reducing/text-summarization-on-gigaword)](https://paperswithcode.com/sota/text-summarization-on-gigaword?p=better-fine-tuning-by-reducing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/better-fine-tuning-by-reducing/text-summarization-on-reddit-tifu)](https://paperswithcode.com/sota/text-summarization-on-reddit-tifu?p=better-fine-tuning-by-reducing)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/better-fine-tuning-by-reducing/abstractive-text-summarization-on-cnn-daily)](https://paperswithcode.com/sota/abstractive-text-summarization-on-cnn-daily?p=better-fine-tuning-by-reducing)`

Better Fine-Tuning by Reducing Representational Collapse

ICLR 2021 · Armen Aghajanyan, Akshat Shrivastava, Anchit Gupta, Naman Goyal, Luke Zettlemoyer, Sonal Gupta ·

Although widely adopted, existing approaches for fine-tuning pre-trained language models have been shown to be unstable across hyper-parameter settings, motivating recent work on trust region methods. In this paper, we present a simplified and efficient method rooted in trust region theory that replaces previously used adversarial objectives with parametric noise (sampling from either a normal or uniform distribution), thereby discouraging representation change during fine-tuning when possible without hurting performance. We also introduce a new analysis to motivate the use of trust region methods more generally, by studying representational collapse; the degradation of generalizable representations from pre-trained models as they are fine-tuned for a specific end task. Extensive experiments show that our fine-tuning method matches or exceeds the performance of previous trust region methods on a range of understanding and generation tasks (including DailyMail/CNN, Gigaword, Reddit TIFU, and the GLUE benchmark), while also being much faster. We also show that it is less prone to representation collapse; the pre-trained models maintain more generalizable representations every time they are fine-tuned.

PDF Abstract ICLR 2021 PDF ICLR 2021 Abstract

Code

Add Remove Mark official

pytorch/fairseq official

29,201

cosmoquester/2021-dialogue-summary-…

123

cliang1453/camero

Tasks

Add Remove

Abstractive Text Summarization

Cross-Lingual Natural Language Inference

Text Summarization

Datasets

GLUE

SST

MultiNLI SST-2

QNLI

MRPC

CoLA

CNN/Daily Mail

XNLI

Reddit TIFU

Results from the Paper

Edit

Ranked #1 on Cross-Lingual Natural Language Inference on XNLI Zero-Shot English-to-Spanish

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Abstractive Text Summarization	CNN / Daily Mail	BART+R3F	ROUGE-1	44.38	# 14	Compare
			ROUGE-2	21.53	# 10	Compare
			ROUGE-L	41.17	# 18	Compare
Text Summarization	GigaWord	BART-RXF	ROUGE-1	40.45	# 2	Compare
			ROUGE-2	20.69	# 2	Compare
			ROUGE-L	36.56	# 12	Compare
Text Summarization	Reddit TIFU	BART+R3F	ROUGE-1	30.31	# 2	Compare
			ROUGE-2	10.98	# 3	Compare
			ROUGE-L	24.74	# 3	Compare
Cross-Lingual Natural Language Inference	XNLI Zero-Shot English-to-French	XLM-R R4F	Accuracy	84.7%	# 1	Compare
Cross-Lingual Natural Language Inference	XNLI Zero-Shot English-to-German	XLM-R R4F	Accuracy	84.2%	# 1	Compare
Cross-Lingual Natural Language Inference	XNLI Zero-Shot English-to-Spanish	XLM-R R4F	Accuracy	85.2%	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Better Fine-Tuning by Reducing Representational Collapse

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove