Search Results for author: Sachin Vashistha

Found 1 papers, 1 papers with code

Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks

1 code implementation • 24 May 2023 • Abhinav Rao, Sachin Vashistha, Atharva Naik, Somak Aditya, Monojit Choudhury

Recent explorations with commercial Large Language Models (LLMs) have shown that non-expert users can jailbreak LLMs by simply manipulating their prompts; resulting in degenerate output behavior, privacy and security breaches, offensive outputs, and violations of content regulator policies.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.