Search Results for author: Jihye Choi

Found 9 papers, 2 papers with code

PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails

no code implementations • 24 Feb 2024 • Neal Mangaokar, Ashish Hooda, Jihye Choi, Shreyas Chandrashekaran, Kassem Fawaz, Somesh Jha, Atul Prakash

More recent LLMs often incorporate an additional layer of defense, a Guard Model, which is a second LLM that is designed to check and moderate the output response of the primary LLM.

Language Modelling Large Language Model

Paper
Add Code

Why Train More? Effective and Efficient Membership Inference via Memorization

no code implementations • 12 Oct 2023 • Jihye Choi, Shruti Tople, Varun Chandrasekaran, Somesh Jha

Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models.

Memorization

Paper
Add Code

Identifying and Mitigating the Security Risks of Generative AI

no code implementations • 28 Aug 2023 • Clark Barrett, Brad Boyd, Elie Burzstein, Nicholas Carlini, Brad Chen, Jihye Choi, Amrita Roy Chowdhury, Mihai Christodorescu, Anupam Datta, Soheil Feizi, Kathleen Fisher, Tatsunori Hashimoto, Dan Hendrycks, Somesh Jha, Daniel Kang, Florian Kerschbaum, Eric Mitchell, John Mitchell, Zulfikar Ramzan, Khawaja Shams, Dawn Song, Ankur Taly, Diyi Yang

However, GenAI can be used just as well by attackers to generate new attacks and increase the velocity and efficacy of existing attacks.

Code Completion In-Context Learning +1

Paper
Add Code

Rethinking Diversity in Deep Neural Network Testing

no code implementations • 25 May 2023 • Zi Wang, Jihye Choi, Ke Wang, Somesh Jha

We note that the objective of testing DNNs is specific and well-defined: identifying inputs that lead to misclassifications.

DNN Testing

Paper
Add Code

Stratified Adversarial Robustness with Rejection

1 code implementation • 2 May 2023 • Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha

We theoretically analyze the stratified rejection setting and propose a novel defense method -- Adversarial Training with Consistent Prediction-based Rejection (CPR) -- for building a robust selective classifier.

Adversarial Robustness Robust classification

Paper
Code

Concept-Based Explanations for Tabular Data

no code implementations • 13 Sep 2022 • Varsha Pendyala, Jihye Choi

The interpretability of machine learning models has been an essential area of research for the safe deployment of machine learning systems.

Attribute Fairness

Paper
Add Code

Concept-based Explanations for Out-Of-Distribution Detectors

1 code implementation • 4 Mar 2022 • Jihye Choi, Jayaram Raghuram, Ryan Feng, Jiefeng Chen, Somesh Jha, Atul Prakash

Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors.

Out of Distribution (OOD) Detection

Paper
Code

Revisiting Adversarial Robustness of Classifiers With a Reject Option

no code implementations • AAAI Workshop AdvML 2022 • Jiefeng Chen, Jayaram Raghuram, Jihye Choi, Xi Wu, YIngyu Liang, Somesh Jha

Motivated by this metric, we propose novel loss functions and a robust training method -- \textit{stratified adversarial training with rejection} (SATR) -- for a classifier with reject option, where the goal is to accept and correctly-classify small input perturbations, while allowing the rejection of larger input perturbations that cannot be correctly classified.

Adversarial Robustness Image Classification

Paper
Add Code

Stochastic Doubly Robust Gradient

no code implementations • 21 Dec 2018 • Kanghoon Lee, Jihye Choi, Moonsu Cha, Jung-Kwon Lee, Tae-Yoon Kim

When training a machine learning model with observational data, it is often encountered that some values are systemically missing.

Fairness

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.