no code implementations • 14 Dec 2023 • Tony T. Wang, Miles Wang, Kaivalya Hariharan, Nir Shavit
LLMs often face competing pressures (for example helpfulness vs. harmlessness).
Adversarial Attack