no code implementations • 29 Feb 2024 • Frederik Kunstner, Robin Yadav, Alan Milligan, Mark Schmidt, Alberto Bietti
We show that the heavy-tailed class imbalance found in language modeling tasks leads to difficulties in the optimization dynamics.
Language Modelling