no code implementations • 1 Apr 2024 • Luxi He, Mengzhou Xia, Peter Henderson
Current Large Language Models (LLMs), even those tuned for safety and alignment, are susceptible to jailbreaking.
no code implementations • NeurIPS 2023 • Hao Wang, Luxi He, Rui Gao, Flavio P. Calmon
We categorize sources of discrimination in the ML pipeline into two classes: aleatoric discrimination, which is inherent in the data distribution, and epistemic discrimination, which is due to decisions made during model development.