Search Results for author: Liya Su

Found 2 papers, 0 papers with code

Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent

no code implementations • 6 May 2024 • Shang Shang, Xinqiang Zhao, Zhongjiang Yao, Yepeng Yao, Liya Su, Zijing Fan, Xiaodan Zhang, Zhengwei Jiang

To demonstrate and address the underlying maliciousness, we propose a theoretical hypothesis and analytical approach, and introduce a new black-box jailbreak attack methodology named IntentObfuscator, exploiting this identified flaw by obfuscating the true intentions behind user prompts. This approach compels LLMs to inadvertently generate restricted content, bypassing their built-in content security measures.

Intent Detection

Paper
Add Code

The Janus Interface: How Fine-Tuning in Large Language Models Amplifies the Privacy Risks

no code implementations • 24 Oct 2023 • Xiaoyi Chen, Siyuan Tang, Rui Zhu, Shijun Yan, Lei Jin, ZiHao Wang, Liya Su, XiaoFeng Wang, Haixu Tang

In the attack, one can construct a PII association task, whereby an LLM is fine-tuned using a minuscule PII dataset, to potentially reinstate and reveal concealed PIIs.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.