1 code implementation • 5 Dec 2023 • Miriam Rateike, Celia Cintas, John Wamburu, Tanya Akumu, Skyler Speakman
We introduce a weakly supervised auditing technique using a subset scanning approach to detect anomalous patterns in LLM activations from pre-trained models.