Search Results for author: Ben Risher

Found 1 papers, 0 papers with code

Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

no code implementations24 Apr 2024 Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

In a multi-turn setting, our threat model elevates the average attack success rate (ASR) to 86. 2%, including a 99% leakage with GPT-4 and claude-1. 3.

Cannot find the paper you are looking for? You can Submit a new open access paper.