Search Results for author: Clement Neo

Found 3 papers, 1 papers with code

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

no code implementations23 Feb 2024 Clement Neo, Shay B. Cohen, Fazl Barez

In this paper, we investigate the interplay between attention heads and specialized "next-token" neurons in the Multilayer Perceptron that predict specific tokens.

Increasing Trust in Language Models through the Reuse of Verified Circuits

1 code implementation4 Feb 2024 Philip Quirke, Clement Neo, Fazl Barez

To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction.

Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models

no code implementations12 Oct 2023 Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, Philip Torr, Fazl Barez

Large language models (LLMs) fine-tuned by reinforcement learning from human feedback (RLHF) are becoming more widely deployed.

Cannot find the paper you are looking for? You can Submit a new open access paper.