Subformer: A Parameter Reduced Transformer

The advent of the Transformer can arguably be described as a driving force behind many of the recent advances in natural language processing. However, despite their sizeable performance improvements, as recently shown, the model is severely over-parameterized, being parameter inefficient and computationally expensive to train... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Abstractive Text Summarization CNN / Daily Mail Subformer-base ROUGE-1 40.9 # 10
ROUGE-2 18.3 # 9
ROUGE-L 37.7 # 10
Language Modelling WikiText-103 Subformer Test perplexity 20.39 # 14
Number of params 96M # 16
Machine Translation WMT 2014 EN-DE Subformer-xlarge BLEU score 29.3 # 1

Methods used in the Paper