no code implementations • 24 Apr 2024 • Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu
In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86. 2%, including a 99% leakage with GPT-4 and claude-1. 3.