no code implementations • 24 Feb 2024 • Neal Mangaokar, Ashish Hooda, Jihye Choi, Shreyas Chandrashekaran, Kassem Fawaz, Somesh Jha, Atul Prakash
More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primary LLM.
no code implementations • 12 Oct 2023 • Jihye Choi, Shruti Tople, Varun Chandrasekaran, Somesh Jha
Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models.
no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang
However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.
no code implementations • 25 May 2023 • Zi Wang, Jihye Choi, Ke Wang, Somesh Jha
We note that the objective of testing DNNs is specific and well-defined: identifying inputs that lead to misclassifications.
1 code implementation • 2 May 2023 • Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha
We theoretically analyze the stratified rejection setting and propose a novel defense method -- Adversarial Training with Consistent Prediction-based Rejection (CPR) -- for building a robust selective classifier.
no code implementations • 13 Sep 2022 • Varsha Pendyala, Jihye Choi
The interpretability of machine learning models has been an essential area of research for the safe deployment of machine learning systems.
1 code implementation • 4 Mar 2022 • Jihye Choi, Jayaram Raghuram, Ryan Feng, Jiefeng Chen, Somesh Jha, Atul Prakash
Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors.
no code implementations • AAAI Workshop AdvML 2022 • Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha
Motivated by this metric, we propose novel loss functions and a robust training method -- \textit{stratified adversarial training with rejection} (SATR) -- for a classifier with reject option, where the goal is to accept and correctly-classify small input perturbations, while allowing the rejection of larger input perturbations that cannot be correctly classified.
no code implementations • 21 Dec 2018 • Kanghoon Lee, Jihye Choi, Moonsu Cha, Jung-Kwon Lee, Tae-Yoon Kim
When training a machine learning model with observational data, it is often encountered that some values are systemically missing.