no code implementations • 8 Feb 2024 • Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang
Some jailbreak prompt datasets, available from the Internet, can also achieve high attack success rates on many LLMs, such as ChatGLM3, GPT-3. 5, and PaLM2.
2 code implementations • 3 Jan 2023 • Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, Yang Zhang
A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset.
1 code implementation • 4 Feb 2021 • Yugeng Liu, Rui Wen, Xinlei He, Ahmed Salem, Zhikun Zhang, Michael Backes, Emiliano De Cristofaro, Mario Fritz, Yang Zhang
As a result, we lack a comprehensive picture of the risks caused by the attacks, e. g., the different scenarios they can be applied to, the common factors that influence their performance, the relationship among them, or the effectiveness of possible defenses.