no code implementations • 16 Oct 2023 • Makoto Yamada, Yuki Takezawa, Guillaume Houry, Kira Michaela Dusterwald, Deborah Sulem, Han Zhao, Yao-Hung Hubert Tsai
We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training.