1 code implementation • 1 Apr 2024 • Seongmin Lee, Zijie J. Wang, Aishwarya Chakravarthy, Alec Helbling, Shengyun Peng, Mansi Phute, Duen Horng Chau, Minsuk Kahng
Our library offers a new way to quickly attribute an LLM's text generation to training data points to inspect model behaviors, enhance its trustworthiness, and compare model-generated text with user-provided text.
1 code implementation • 30 Aug 2023 • Shengyun Peng, Weilin Xu, Cory Cornelius, Matthew Hull, Kevin Li, Rahul Duggal, Mansi Phute, Jason Martin, Duen Horng Chau
Our research aims to unify existing works' diverging opinions on how architectural components affect the adversarial robustness of CNNs.
1 code implementation • 14 Aug 2023 • Mansi Phute, Alec Helbling, Matthew Hull, Shengyun Peng, Sebastian Szyller, Cory Cornelius, Duen Horng Chau
We test LLM Self Defense on GPT 3. 5 and Llama 2, two of the current most prominent LLMs against various types of attacks, such as forcefully inducing affirmative responses to prompts and prompt engineering attacks.