no code implementations • 23 Feb 2024 • Clement Neo, Shay B. Cohen, Fazl Barez
In this paper, we investigate the interplay between attention heads and specialized "next-token" neurons in the Multilayer Perceptron that predict specific tokens.
2 code implementations • 4 Feb 2024 • Philip Quirke, Clement Neo, Fazl Barez
To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction.
no code implementations • 12 Oct 2023 • Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, Philip Torr, Fazl Barez
Large language models (LLMs) fine-tuned by reinforcement learning from human feedback (RLHF) are becoming more widely deployed.