Search Results for author: Teun van der Weij

Found 2 papers, 2 papers with code

Extending Activation Steering to Broad Skills and Multiple Behaviours

1 code implementation9 Mar 2024 Teun van der Weij, Massimo Poesio, Nandi Schoots

In this paper, we investigate the efficacy of activation steering for broad skills and multiple behaviours.

Evaluating Shutdown Avoidance of Language Models in Textual Scenarios

1 code implementation3 Jul 2023 Teun van der Weij, Simon Lermen, Leon Lang

Recently, there has been an increase in interest in evaluating large language models for emergent and dangerous capabilities.

Cannot find the paper you are looking for? You can Submit a new open access paper.