Search Results for author: Ben Risher

Found 1 papers, 0 papers with code

Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

no code implementations • 24 Apr 2024 • Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86. 2%, including a 99% leakage with GPT-4 and claude-1. 3.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.