no code implementations • EMNLP 2021 • Pu-Chin Chen, Henry Tsai, Srinadh Bhojanapalli, Hyung Won Chung, Yin-Wen Chang, Chun-Sung Ferng
Our analysis shows that the gain actually comes from moving positional information to attention layer from the input.