no code implementations • ACL 2022 • Sangwon Yu, Jongyoon Song, Heeseung Kim, Seong-min Lee, Woo-Jong Ryu, Sungroh Yoon
AGG addresses the degeneration problem by gating the specific part of the gradient for rare token embeddings.
Language Modelling Machine Translation +3