no code implementations • 10 Jun 2023 • Shen-sian Syu, Juncheng Xie, Hung-Yi Lee
In our experiments, our model outperforms the baseline autoregressive model (Transformer \textit{base}) on multiple datasets, including WMT'14 DE$\leftrightarrow$EN, WMT'16 RO$\leftrightarrow$EN, and IWSLT'14 DE$\leftrightarrow$EN.
Language Modelling Pretrained Multilingual Language Models +1