Search Results for author: Jianing Zhu

Found 9 papers, 7 papers with code

DeepInception: Hypnotize Large Language Model to Be Jailbreaker

1 code implementation6 Nov 2023 Xuan Li, Zhanke Zhou, Jianing Zhu, Jiangchao Yao, Tongliang Liu, Bo Han

Despite remarkable success in various applications, large language models (LLMs) are vulnerable to adversarial jailbreaks that make the safety guardrails void.

Language Modelling Large Language Model

Exploring Model Dynamics for Accumulative Poisoning Discovery

1 code implementation6 Jun 2023 Jianing Zhu, Xiawei Guo, Jiangchao Yao, Chao Du, Li He, Shuo Yuan, Tongliang Liu, Liang Wang, Bo Han

In this paper, we dive into the perspective of model dynamics and propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.

Memorization

Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability

1 code implementation6 Jun 2023 Jianing Zhu, Hengzhuang Li, Jiangchao Yao, Tongliang Liu, Jianliang Xu, Bo Han

Based on such insights, we propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data.

Out-of-Distribution Detection

Combating Exacerbated Heterogeneity for Robust Models in Federated Learning

1 code implementation1 Mar 2023 Jianing Zhu, Jiangchao Yao, Tongliang Liu, Quanming Yao, Jianliang Xu, Bo Han

Privacy and security concerns in real-world applications have led to the development of adversarially robust federated models.

Federated Learning

$\alpha$-Weighted Federated Adversarial Training

no code implementations29 Sep 2021 Jianing Zhu, Jiangchao Yao, Tongliang Liu, Kunyang Jia, Jingren Zhou, Bo Han, Hongxia Yang

Federated Adversarial Training (FAT) helps us address the data privacy and governance issues, meanwhile maintains the model robustness to the adversarial attack.

Adversarial Attack Federated Learning

Reliable Adversarial Distillation with Unreliable Teachers

2 code implementations ICLR 2022 Jianing Zhu, Jiangchao Yao, Bo Han, Jingfeng Zhang, Tongliang Liu, Gang Niu, Jingren Zhou, Jianliang Xu, Hongxia Yang

However, when considering adversarial robustness, teachers may become unreliable and adversarial distillation may not work: teachers are pretrained on their own adversarial data, and it is too demanding to require that teachers are also good at every adversarial data queried by students.

Adversarial Robustness

Understanding the Interaction of Adversarial Training with Noisy Labels

no code implementations6 Feb 2021 Jianing Zhu, Jingfeng Zhang, Bo Han, Tongliang Liu, Gang Niu, Hongxia Yang, Mohan Kankanhalli, Masashi Sugiyama

A recent adversarial training (AT) study showed that the number of projected gradient descent (PGD) steps to successfully attack a point (i. e., find an adversarial example in its proximity) is an effective measure of the robustness of this point.

Geometry-aware Instance-reweighted Adversarial Training

2 code implementations ICLR 2021 Jingfeng Zhang, Jianing Zhu, Gang Niu, Bo Han, Masashi Sugiyama, Mohan Kankanhalli

The belief was challenged by recent studies where we can maintain the robustness and improve the accuracy.

Cannot find the paper you are looking for? You can Submit a new open access paper.