1 code implementation • 6 Oct 2023 • Edward J. Hu, Moksh Jain, Eric Elmoznino, Younesse Kaddar, Guillaume Lajoie, Yoshua Bengio, Nikolay Malkin
Autoregressive large language models (LLMs) compress knowledge from their training data through next-token conditional distributions.