Search Results for author: Kai Williams

Found 1 papers, 0 papers with code

Immunization against harmful fine-tuning attacks

no code implementations26 Feb 2024 Domenic Rosati, Jan Wehner, Kai Williams, Łukasz Bartoszcze, Jan Batzner, Hassan Sajjad, Frank Rudzicz

Approaches to aligning large language models (LLMs) with human values has focused on correcting misalignment that emerges from pretraining.

Cannot find the paper you are looking for? You can Submit a new open access paper.