Search Results for author: Weikai Lu

Found 1 papers, 0 papers with code

Eraser: Jailbreaking Defense in Large Language Models via Unlearning Harmful Knowledge

no code implementations • 8 Apr 2024 • Weikai Lu, Ziqian Zeng, Jianwei Wang, Zhengdong Lu, Zelin Chen, Huiping Zhuang, Cen Chen

Jailbreaking attacks can enable Large Language Models (LLMs) to bypass the safeguard and generate harmful content.

General Knowledge

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.