Search Results for author: Sizhe Chen

Found 17 papers, 10 papers with code

Jatmo: Prompt Injection Defense by Task-Specific Finetuning

1 code implementation29 Dec 2023 Julien Piet, Maha Alrashed, Chawin Sitawarin, Sizhe Chen, Zeming Wei, Elizabeth Sun, Basel Alomair, David Wagner

Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs.

Instruction Following

Can LLMs Follow Simple Rules?

1 code implementation6 Nov 2023 Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Basel Alomair, Dan Hendrycks, David Wagner

As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner.

Investigating Catastrophic Overfitting in Fast Adversarial Training: A Self-fitting Perspective

no code implementations23 Feb 2023 Zhengbao He, Tao Li, Sizhe Chen, Xiaolin Huang

Based on self-fitting, we provide new insights into the existing methods to mitigate CO and extend CO to multi-step adversarial training.

Self-Learning

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

1 code implementation22 Nov 2022 Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, Xiaolin Huang

In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures.

Unifying Gradients to Improve Real-world Robustness for Deep Networks

1 code implementation12 Aug 2022 Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

The wide application of deep neural networks (DNNs) demands an increasing amount of attention to their real-world robustness, i. e., whether a DNN resists black-box adversarial attacks, among which score-based query attacks (SQAs) are most threatening since they can effectively hurt a victim network with the only access to model outputs.

One-Pixel Shortcut: on the Learning Preference of Deep Neural Networks

1 code implementation24 May 2022 Shutong Wu, Sizhe Chen, Cihang Xie, Xiaolin Huang

Based on OPS, we introduce an unlearnable dataset called CIFAR-10-S, which is indistinguishable from CIFAR-10 by humans but induces the trained model to extremely low accuracy.

Data Augmentation

Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks

1 code implementation24 May 2022 Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang

The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores.

Adversarial Attack

Subspace Adversarial Training

1 code implementation CVPR 2022 Tao Li, Yingwen Wu, Sizhe Chen, Kun Fang, Xiaolin Huang

Single-step adversarial training (AT) has received wide attention as it proved to be both efficient and robust.

Dominant Patterns: Critical Features Hidden in Deep Neural Networks

no code implementations31 May 2021 Zhixing Ye, Shaofei Qin, Sizhe Chen, Xiaolin Huang

As the name suggests, for a natural image, if we add the dominant pattern of a DNN to it, the output of this DNN is determined by the dominant pattern instead of the original image, i. e., DNN's prediction is the same with the dominant pattern's.

Query Attack by Multi-Identity Surrogates

2 code implementations31 May 2021 Sizhe Chen, Zhehao Huang, Qinghua Tao, Xiaolin Huang

Deep Neural Networks (DNNs) are acknowledged as vulnerable to adversarial attacks, while the existing black-box attacks require extensive queries on the victim DNN to achieve high success rates.

Measuring the Transferability of $\ell_\infty$ Attacks by the $\ell_2$ Norm

no code implementations20 Feb 2021 Sizhe Chen, Qinghua Tao, Zhixing Ye, Xiaolin Huang

Deep neural networks could be fooled by adversarial examples with trivial differences to original samples.

Relevance Attack on Detectors

1 code implementation16 Aug 2020 Sizhe Chen, Fan He, Xiaolin Huang, Kun Zhang

This paper focuses on high-transferable adversarial attacks on detectors, which are hard to attack in a black-box manner, because of their multiple-output characteristics and the diversity across architectures.

Autonomous Driving Instance Segmentation +4

Double Backpropagation for Training Autoencoders against Adversarial Attack

no code implementations4 Mar 2020 Chengjin Sun, Sizhe Chen, Xiaolin Huang

We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack.

Adversarial Attack Robust classification

Type I Attack for Generative Models

no code implementations4 Mar 2020 Chengjin Sun, Sizhe Chen, Jia Cai, Xiaolin Huang

To implement the Type I attack, we destroy the original one by increasing the distance in input space while keeping the output similar because different inputs may correspond to similar features for the property of deep neural network.

Vocal Bursts Type Prediction

HRFA: High-Resolution Feature-based Attack

no code implementations21 Jan 2020 Zhixing Ye, Sizhe Chen, Peidong Zhang, Chengjin Sun, Xiaolin Huang

Adversarial attacks have long been developed for revealing the vulnerability of Deep Neural Networks (DNNs) by adding imperceptible perturbations to the input.

Denoising Face Verification +1

Universal Adversarial Attack on Attention and the Resulting Dataset DAmageNet

no code implementations16 Jan 2020 Sizhe Chen, Zhengbao He, Chengjin Sun, Jie Yang, Xiaolin Huang

AoA enjoys a significant increase in transferability when the traditional cross entropy loss is replaced with the attention loss.

Adversarial Attack

DAmageNet: A Universal Adversarial Dataset

1 code implementation16 Dec 2019 Sizhe Chen, Xiaolin Huang, Zhengbao He, Chengjin Sun

Adversarial samples are similar to the clean ones, but are able to cheat the attacked DNN to produce incorrect predictions in high confidence.

Adversarial Attack

Cannot find the paper you are looking for? You can Submit a new open access paper.