1 code implementation • 2 Apr 2024 • David Raposo, Sam Ritter, Blake Richards, Timothy Lillicrap, Peter Conway Humphreys, Adam Santoro
Our method enforces a total compute budget by capping the number of tokens ($k$) that can participate in the self-attention and MLP computations at a given layer.
no code implementations • 29 Sep 2021 • Karol Gregor, Peter Conway Humphreys
We consider the problem of searching, end to end, for effective weight and activation update rules governing online learning of a recurrent network on problems of character sequence memorisation and prediction.