Alleviating the Inequality of Attention Heads for Neural Machine Translation

21 Sep 2020 Zewei Sun Shu-Jian Huang Xin-yu Dai Jia-Jun Chen

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper