Search Results for author: Logan Riggs

Found 1 papers, 1 papers with code

Sparse Autoencoders Find Highly Interpretable Features in Language Models

2 code implementations15 Sep 2023 Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, Lee Sharkey

One hypothesised cause of polysemanticity is \textit{superposition}, where neural networks represent more features than they have neurons by assigning features to an overcomplete set of directions in activation space, rather than to individual neurons.

counterfactual Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.