Search Results for author: Liya Su

Found 2 papers, 0 papers with code

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

no code implementations6 May 2024 Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts. This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures.

Intent Detection

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

no code implementations24 Oct 2023 Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, ZiHao Wang, Liya Su, XiaoFeng Wang, Haixu Tang

In the attack, one can construct a PII association task, whereby an LLM is fine-tuned using a minuscule PII dataset, to potentially reinstate and reveal concealed PIIs.

Cannot find the paper you are looking for? You can Submit a new open access paper.