Search Results for author: Senthooran Rajamanoharan

Found 1 papers, 0 papers with code

Improving Dictionary Learning with Gated Sparse Autoencoders

no code implementations • 24 Apr 2024 • Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Tom Lieberum, Vikrant Varma, János Kramár, Rohin Shah, Neel Nanda

Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations.

Dictionary Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.