A Sandwich Transformer is a variant of a Transformer that reorders sublayers in the architecture to achieve better performance. The reordering is based on the authors' analysis that models with more self-attention toward the bottom and more feedforward sublayers toward the top tend to perform better in general.
Source: Improving Transformer Models by Reordering their SublayersPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 1 | 33.33% |
Machine Translation | 1 | 33.33% |
Translation | 1 | 33.33% |