Search Results for author: Kryštof Mitka

Found 1 papers, 1 papers with code

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

2 code implementations22 Apr 2024 Javier Rando, Francesco Croce, Kryštof Mitka, Stepan Shabalin, Maksym Andriushchenko, Nicolas Flammarion, Florian Tramèr

Large language models are aligned to be safe, preventing users from generating harmful content like misinformation or instructions for illegal activities.

Misinformation

Cannot find the paper you are looking for? You can Submit a new open access paper.