Search Results for author: James Dao

Found 2 papers, 0 papers with code

Position Paper: Toward New Frameworks for Studying Model Representations

no code implementations • 6 Feb 2024 • Satvik Golechha, James Dao

Mechanistic interpretability (MI) aims to understand AI models by reverse-engineering the exact algorithms neural networks learn.

Paper
Add Code

An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4l

no code implementations • 11 Oct 2023 • James Dao, Yeu-Tong Lau, Can Rager, Jett Janiak

That is, clearing residual stream directions set by earlier layers by reading in information and writing out the negative version.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.