Search Results for author: Can Rager

Found 5 papers, 4 papers with code

Structured World Representations in Maze-Solving Transformers

1 code implementation5 Dec 2023 Michael Igorevich Ivanitskiy, Alex F. Spies, Tilman Räuker, Guillaume Corlouer, Chris Mathwin, Lucia Quirke, Can Rager, Rusheb Shah, Dan Valentine, Cecilia Diniz Behn, Katsumi Inoue, Samy Wu Fung

Transformer models underpin many recent advances in practical machine learning applications, yet understanding their internal behavior continues to elude researchers.

valid

Attribution Patching Outperforms Automated Circuit Discovery

1 code implementation16 Oct 2023 Aaquib Syed, Can Rager, Arthur Conmy

Automated interpretability research has recently attracted attention as a potential research direction that could scale explanations of neural network behavior to large models.

An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l

no code implementations11 Oct 2023 James Dao, Yeu-Tong Lau, Can Rager, Jett Janiak

That is, clearing residual stream directions set by earlier layers by reading in information and writing out the negative version.

Management

Cannot find the paper you are looking for? You can Submit a new open access paper.