Search Results for author: Jihye Choi

Found 9 papers, 2 papers with code

PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails

no code implementations24 Feb 2024 Neal Mangaokar, Ashish Hooda, Jihye Choi, Shreyas Chandrashekaran, Kassem Fawaz, Somesh Jha, Atul Prakash

More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primary LLM.

Language Modelling Large Language Model

Why Train More? Effective and Efficient Membership Inference via Memorization

no code implementations12 Oct 2023 Jihye Choi, Shruti Tople, Varun Chandrasekaran, Somesh Jha

Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models.

Memorization

Rethinking Diversity in Deep Neural Network Testing

no code implementations25 May 2023 Zi Wang, Jihye Choi, Ke Wang, Somesh Jha

We note that the objective of testing DNNs is specific and well-defined: identifying inputs that lead to misclassifications.

DNN Testing

Stratified Adversarial Robustness with Rejection

1 code implementation2 May 2023 Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha

We theoretically analyze the stratified rejection setting and propose a novel defense method -- Adversarial Training with Consistent Prediction-based Rejection (CPR) -- for building a robust selective classifier.

Adversarial Robustness Robust classification

Concept-Based Explanations for Tabular Data

no code implementations13 Sep 2022 Varsha Pendyala, Jihye Choi

The interpretability of machine learning models has been an essential area of research for the safe deployment of machine learning systems.

Attribute Fairness

Concept-based Explanations for Out-Of-Distribution Detectors

1 code implementation4 Mar 2022 Jihye Choi, Jayaram Raghuram, Ryan Feng, Jiefeng Chen, Somesh Jha, Atul Prakash

Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors.

Out of Distribution (OOD) Detection

Revisiting Adversarial Robustness of Classifiers With a Reject Option

no code implementations AAAI Workshop AdvML 2022 Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha

Motivated by this metric, we propose novel loss functions and a robust training method -- \textit{stratified adversarial training with rejection} (SATR) -- for a classifier with reject option, where the goal is to accept and correctly-classify small input perturbations, while allowing the rejection of larger input perturbations that cannot be correctly classified.

Adversarial Robustness Image Classification

Stochastic Doubly Robust Gradient

no code implementations21 Dec 2018 Kanghoon Lee, Jihye Choi, Moonsu Cha, Jung-Kwon Lee, Tae-Yoon Kim

When training a machine learning model with observational data, it is often encountered that some values are systemically missing.

Fairness

Cannot find the paper you are looking for? You can Submit a new open access paper.