no code implementations • 6 Nov 2022 • Qingyun Dou, Mark Gales
A deliberation network consists of multiple standard sequence-to-sequence models, each one conditioned on the initial input and the output of the previous model.
no code implementations • 6 Nov 2022 • Qingyun Dou, Mark Gales
Attention forcing has been introduced to address the mismatch, guiding the model with the generated back-history and reference attention.
1 code implementation • 2 Apr 2021 • Qingyun Dou, Yiting Lu, Potsawee Manakul, Xixin Wu, Mark J. F. Gales
This approach guides the model with the generated output history and reference attention, and can reduce the training-inference mismatch without a schedule or a classifier.
no code implementations • 26 Sep 2019 • Qingyun Dou, Yiting Lu, Joshua Efiong, Mark J. F. Gales
This paper introduces attention forcing, which guides the model with generated output history and reference attention.