Search Results for author: Clement Neo

Found 3 papers, 1 papers with code

Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions

no code implementations • 23 Feb 2024 • Clement Neo, Shay B. Cohen, Fazl Barez

In this paper, we investigate the interplay between attention heads and specialized "next-token" neurons in the Multilayer Perceptron that predict specific tokens.

Paper
Add Code

Increasing Trust in Language Models through the Reuse of Verified Circuits

2 code implementations • 4 Feb 2024 • Philip Quirke, Clement Neo, Fazl Barez

To exhibit the reusability of verified modules, we insert the trained integer addition model into an untrained model and train the combined model to perform both addition and subtraction.

Paper
Code

Beyond Training Objectives: Interpreting Reward Model Divergence in Large Language Models

no code implementations • 12 Oct 2023 • Luke Marks, Amir Abdullah, Clement Neo, Rauno Arike, Philip Torr, Fazl Barez

Large language models (LLMs) fine-tuned by reinforcement learning from human feedback (RLHF) are becoming more widely deployed.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.