Multi-Unit Transformers for Neural Machine Translation

Transformer models achieve remarkable success in Neural Machine Translation. Many efforts have been devoted to deepening the Transformer by stacking several units (i.e., a combination of Multihead Attentions and FFN) in a cascade, while the investigation over multiple parallel units draws little attention... (read more)

Results in Papers With Code
(↓ scroll down to see all results)