Search Results for author: Vikash Sehwag

Found 22 papers, 10 papers with code

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

1 code implementation28 Mar 2024 Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.

Finding needles in a haystack: A Black-Box Approach to Invisible Watermark Detection

no code implementations23 Mar 2024 Minzhou Pan, Zhenting Wang, Xin Dong, Vikash Sehwag, Lingjuan Lyu, Xue Lin

In this paper, we propose WaterMark Detection (WMD), the first invisible watermark detection method under a black-box and annotation-free setting.

Scaling Compute Is Not All You Need for Adversarial Robustness

no code implementations20 Dec 2023 Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj, Bhavya Kailkhura

Finally, we make our benchmarking framework (built on top of \texttt{timm}~\citep{rw2019timm}) publicly available to facilitate future analysis in efficient robust deep learning.

Adversarial Robustness Benchmarking

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

no code implementations21 Feb 2023 Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin-Yu Chen, Prateek Mittal

Using our framework, we present the first leaderboard, MultiRobustBench, for benchmarking multiattack evaluation which captures performance across attack types and attack strengths.

Benchmarking

Extracting Training Data from Diffusion Models

1 code implementation30 Jan 2023 Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tramèr, Borja Balle, Daphne Ippolito, Eric Wallace

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images.

Privacy Preserving

Uncovering Adversarial Risks of Test-Time Adaptation

no code implementations29 Jan 2023 Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

Recently, test-time adaptation (TTA) has been proposed as a promising solution for addressing distribution shifts.

Test-time Adaptation

DP-RAFT: A Differentially Private Recipe for Accelerated Fine-Tuning

no code implementations8 Dec 2022 Ashwinee Panda, Xinyu Tang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

A major direction in differentially private machine learning is differentially private fine-tuning: pretraining a model on a source of "public data" and transferring the extracted features to downstream tasks.

Image Classification

A Light Recipe to Train Robust Vision Transformers

1 code implementation15 Sep 2022 Edoardo Debenedetti, Vikash Sehwag, Prateek Mittal

Additionally, investigating the reasons for the robustness of our models, we show that it is easier to generate strong attacks during training when using our recipe and that this leads to better robustness at test time.

Adversarial Robustness Data Augmentation +1

Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation

no code implementations22 Jul 2022 Tong Wu, Tianhao Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal

Our attack can be easily deployed in the real world since it only requires rotating the object, as we show in both image classification and object detection applications.

Data Augmentation Image Classification +3

Understanding Robust Learning through the Lens of Representation Similarities

1 code implementation20 Jun 2022 Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Y. Zhao, Prateek Mittal, Haitao Zheng

Representation learning, i. e. the generation of representations useful for downstream applications, is a task of fundamental importance that underlies much of the success of deep neural networks (DNNs).

Representation Learning

Lower Bounds on Cross-Entropy Loss in the Presence of Test-time Adversaries

1 code implementation16 Apr 2021 Arjun Nitin Bhagoji, Daniel Cullina, Vikash Sehwag, Prateek Mittal

In particular, it is critical to determine classifier-agnostic bounds on the training loss to establish when learning is possible.

RobustBench: a standardized adversarial robustness benchmark

1 code implementation19 Oct 2020 Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models.

Adversarial Robustness Benchmarking +3

Fast-Convergent Federated Learning

no code implementations26 Jul 2020 Hung T. Nguyen, Vikash Sehwag, Seyyedali Hosseinalipour, Christopher G. Brinton, Mung Chiang, H. Vincent Poor

In this paper, we propose a fast-convergent federated learning algorithm, called FOLB, which performs intelligent sampling of devices in each round of model training to optimize the expected convergence speed.

BIG-bench Machine Learning Federated Learning

A Critical Evaluation of Open-World Machine Learning

no code implementations8 Jul 2020 Liwei Song, Vikash Sehwag, Arjun Nitin Bhagoji, Prateek Mittal

With our evaluation across 6 OOD detectors, we find that the choice of in-distribution data, model architecture and OOD data have a strong impact on OOD detection performance, inducing false positive rates in excess of $70\%$.

BIG-bench Machine Learning Out of Distribution (OOD) Detection

Time for a Background Check! Uncovering the impact of Background Features on Deep Neural Networks

no code implementations24 Jun 2020 Vikash Sehwag, Rajvardhan Oak, Mung Chiang, Prateek Mittal

With increasing expressive power, deep neural networks have significantly improved the state-of-the-art on image classification datasets, such as ImageNet.

Image Classification

PatchGuard: A Provably Robust Defense against Adversarial Patches via Small Receptive Fields and Masking

2 code implementations17 May 2020 Chong Xiang, Arjun Nitin Bhagoji, Vikash Sehwag, Prateek Mittal

In this paper, we propose a general defense framework called PatchGuard that can achieve high provable robustness while maintaining high clean accuracy against localized adversarial patches.

HYDRA: Pruning Adversarially Robust Neural Networks

4 code implementations NeurIPS 2020 Vikash Sehwag, Shiqi Wang, Prateek Mittal, Suman Jana

We demonstrate that our approach, titled HYDRA, achieves compressed networks with state-of-the-art benign and robust accuracy, simultaneously.

Network Pruning

Towards Compact and Robust Deep Neural Networks

no code implementations14 Jun 2019 Vikash Sehwag, Shiqi Wang, Prateek Mittal, Suman Jana

In this work, we rigorously study the extension of network pruning strategies to preserve both benign accuracy and robustness of a network.

Adversarial Robustness Network Pruning

Better the Devil you Know: An Analysis of Evasion Attacks using Out-of-Distribution Adversarial Examples

no code implementations5 May 2019 Vikash Sehwag, Arjun Nitin Bhagoji, Liwei Song, Chawin Sitawarin, Daniel Cullina, Mung Chiang, Prateek Mittal

A large body of recent work has investigated the phenomenon of evasion attacks using adversarial examples for deep learning systems, where the addition of norm-bounded perturbations to the test inputs leads to incorrect output classification.

Autonomous Driving General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.