TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Translation	IWSLT2015 Vietnamese-English	HeadMask (Random-18)	BLEU	26.85	# 1
Machine Translation	IWSLT2015 Vietnamese-English	HeadMask (Impt-18)	BLEU	26.36	# 2
Machine Translation	WMT2016 Romanian-English	HeadMask (Random-18)	BLEU score	32.85	# 12
Machine Translation	WMT2016 Romanian-English	HeadMask (Impt-18)	BLEU score	32.95	# 9
Machine Translation	WMT2017 Turkish-English	HeadMask (Random-18)	BLEU score	17.56	# 1
Machine Translation	WMT2017 Turkish-English	HeadMask (Impt-18)	BLEU score	17.48	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alleviating-the-inequality-of-attention-heads/machine-translation-on-iwslt2015-vietnamese)](https://paperswithcode.com/sota/machine-translation-on-iwslt2015-vietnamese?p=alleviating-the-inequality-of-attention-heads)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alleviating-the-inequality-of-attention-heads/machine-translation-on-wmt2017-turkish)](https://paperswithcode.com/sota/machine-translation-on-wmt2017-turkish?p=alleviating-the-inequality-of-attention-heads)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/alleviating-the-inequality-of-attention-heads/machine-translation-on-wmt2016-romanian)](https://paperswithcode.com/sota/machine-translation-on-wmt2016-romanian?p=alleviating-the-inequality-of-attention-heads)`

Alleviating the Inequality of Attention Heads for Neural Machine Translation

COLING 2022 · Zewei Sun, Shu-Jian Huang, Xin-yu Dai, Jia-Jun Chen ·

Recent studies show that the attention heads in Transformer are not equal. We relate this phenomenon to the imbalance training of multi-head attention and the model dependence on specific heads. To tackle this problem, we propose a simple masking method: HeadMask, in two specific ways. Experiments show that translation improvements are achieved on multiple language pairs. Subsequent empirical analyses also support our assumption and confirm the effectiveness of the method.

PDF Abstract COLING 2022 PDF COLING 2022 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Machine Translation

Translation

Datasets

WMT 2016

WMT 2016 News IWSLT2015

Results from the Paper

Edit

Ranked #1 on Machine Translation on WMT2017 Turkish-English

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Translation	IWSLT2015 Vietnamese-English	HeadMask (Random-18)	BLEU	26.85	# 1	Compare
Machine Translation	IWSLT2015 Vietnamese-English	HeadMask (Impt-18)	BLEU	26.36	# 2	Compare
Machine Translation	WMT2016 Romanian-English	HeadMask (Random-18)	BLEU score	32.85	# 12	Compare
Machine Translation	WMT2016 Romanian-English	HeadMask (Impt-18)	BLEU score	32.95	# 9	Compare
Machine Translation	WMT2017 Turkish-English	HeadMask (Random-18)	BLEU score	17.56	# 1	Compare
Machine Translation	WMT2017 Turkish-English	HeadMask (Impt-18)	BLEU score	17.48	# 2	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove