no code implementations • 6 Feb 2024 • Satvik Golechha, James Dao
Mechanistic interpretability (MI) aims to understand AI models by reverse-engineering the exact algorithms neural networks learn.
no code implementations • 11 Oct 2023 • James Dao, Yeu-Tong Lau, Can Rager, Jett Janiak
That is, clearing residual stream directions set by earlier layers by reading in information and writing out the negative version.