Transformers

Sandwich Transformer

Introduced by Press et al. in Improving Transformer Models by Reordering their Sublayers

A Sandwich Transformer is a variant of a Transformer that reorders sublayers in the architecture to achieve better performance. The reordering is based on the authors' analysis that models with more self-attention toward the bottom and more feedforward sublayers toward the top tend to perform better in general.

Source: Improving Transformer Models by Reordering their Sublayers

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Language Modelling 1 33.33%
Machine Translation 1 33.33%
Translation 1 33.33%

Categories