1 code implementation • 21 Mar 2023 • William Merrill, Nikolaos Tsilivis, Aman Shukla
Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly.