1 code implementation • 9 Mar 2024 • Teun van der Weij, Massimo Poesio, Nandi Schoots
In this paper, we investigate the efficacy of activation steering for broad skills and multiple behaviours.
1 code implementation • 3 Jul 2023 • Teun van der Weij, Simon Lermen, Leon Lang
Recently, there has been an increase in interest in evaluating large language models for emergent and dangerous capabilities.