Search Results for author: Zhichen Dong

Found 2 papers, 2 papers with code

Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!

1 code implementation19 Feb 2024 Zhanhui Zhou, Jie Liu, Zhichen Dong, Jiaheng Liu, Chao Yang, Wanli Ouyang, Yu Qiao

Large language models (LLMs) need to undergo safety alignment to ensure safe conversations with humans.

Language Modelling

Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey

1 code implementation14 Feb 2024 Zhichen Dong, Zhanhui Zhou, Chao Yang, Jing Shao, Yu Qiao

Large Language Models (LLMs) are now commonplace in conversation applications.

Cannot find the paper you are looking for? You can Submit a new open access paper.