Search Results for author: Adib Hasan

Found 1 papers, 1 papers with code

Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning

1 code implementation19 Jan 2024 Adib Hasan, Ileana Rugina, Alex Wang

Large Language Models (LLMs) are susceptible to `jailbreaking' prompts, which can induce the generation of harmful content.

Cannot find the paper you are looking for? You can Submit a new open access paper.