Search Results for author: Maksym Andriushchenko

Found 25 papers, 20 papers with code

Competition Report: Finding Universal Jailbreak Backdoors in Aligned LLMs

no code implementations • 22 Apr 2024 • Javier Rando, Francesco Croce, Kryštof Mitka, Stepan Shabalin, Maksym Andriushchenko, Nicolas Flammarion, Florian Tramèr

Large language models are aligned to be safe, preventing users from generating harmful content like misinformation or instructions for illegal activities.

Paper
Add Code

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

1 code implementation • 2 Apr 2024 • Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks.

In-Context Learning

Paper
Code

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

1 code implementation • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (2) a jailbreaking dataset comprising 100 behaviors -- both original and sourced from prior work -- which align with OpenAI's usage policies; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.

Paper
Code

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

1 code implementation • 7 Feb 2024 • Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

There is a consensus that instruction fine-tuning of LLMs requires high-quality data, but what are they?

Paper
Code

Scaling Compute Is Not All You Need for Adversarial Robustness

no code implementations • 20 Dec 2023 • Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj, Bhavya Kailkhura

Finally, we make our benchmarking framework (built on top of \texttt{timm}~\citep{rw2019timm}) publicly available to facilitate future analysis in efficient robust deep learning.

Adversarial Robustness Benchmarking

Paper
Add Code

Analyzing Sharpness-aware Minimization under Overparameterization

1 code implementation • 29 Nov 2023 • Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee

Training an overparameterized neural network can yield minimizers of different generalization capabilities despite the same level of training loss.

Paper
Code

Why Do We Need Weight Decay in Modern Deep Learning?

1 code implementation • 6 Oct 2023 • Maksym Andriushchenko, Francesco D'Angelo, Aditya Varre, Nicolas Flammarion

In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory.

Learning Theory Stochastic Optimization

Paper
Code

Layer-wise Linear Mode Connectivity

1 code implementation • 13 Jul 2023 • Linara Adilova, Maksym Andriushchenko, Michael Kamp, Asja Fischer, Martin Jaggi

Averaging neural network parameters is an intuitive method for fusing the knowledge of two independent models.

Federated Learning Linear Mode Connectivity

Paper
Code

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

1 code implementation • NeurIPS 2023 • Klim Kireev, Maksym Andriushchenko, Carmela Troncoso, Nicolas Flammarion

We present a method that allows us to train adversarially robust deep networks for tabular data and to transfer this robustness to other classifiers via universal robust embeddings tailored to categorical data.

Adversarial Robustness Fraud Detection +2

Paper
Code

A Modern Look at the Relationship between Sharpness and Generalization

1 code implementation • 14 Feb 2023 • Maksym Andriushchenko, Francesco Croce, Maximilian Müller, Matthias Hein, Nicolas Flammarion

Overall, we observe that sharpness does not correlate well with generalization but rather with some training parameters like the learning rate that can be positively or negatively correlated with generalization depending on the setup.

Paper
Code

SGD with Large Step Sizes Learns Sparse Features

1 code implementation • 11 Oct 2022 • Maksym Andriushchenko, Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

We present empirical observations that commonly used large step sizes (i) lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics orthogonal to the bouncing directions that biases it implicitly toward sparse predictors.

Paper
Code

Towards Understanding Sharpness-Aware Minimization

1 code implementation • 13 Jun 2022 • Maksym Andriushchenko, Nicolas Flammarion

We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements.

Paper
Code

ARIA: Adversarially Robust Image Attribution for Content Provenance

no code implementations • 25 Feb 2022 • Maksym Andriushchenko, Xiaoyang Rebecca Li, Geoffrey Oxholm, Thomas Gittings, Tu Bui, Nicolas Flammarion, John Collomosse

Finally, we show how to train an adversarially robust image comparator model for detecting editorial changes in matched images.

Contrastive Learning Misinformation +1

Paper
Add Code

Understanding Sharpness-Aware Minimization

no code implementations • 29 Sep 2021 • Maksym Andriushchenko, Nicolas Flammarion

Next, we discuss why SAM can be helpful in the noisy label setting where we first show that it can help to improve generalization even for linear classifiers.

Learning with noisy labels

Paper
Add Code

On the effectiveness of adversarial training against common corruptions

1 code implementation • 3 Mar 2021 • Klim Kireev, Maksym Andriushchenko, Nicolas Flammarion

First, we show that, when used with an appropriately selected perturbation radius, $\ell_p$ adversarial training can serve as a strong baseline against common corruptions improving both accuracy and calibration.

Data Augmentation

Paper
Code

RobustBench: a standardized adversarial robustness benchmark

1 code implementation • 19 Oct 2020 • Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models.

Adversarial Robustness Benchmarking +3

596

Paper
Code

Understanding and Improving Fast Adversarial Training

1 code implementation • NeurIPS 2020 • Maksym Andriushchenko, Nicolas Flammarion

We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation.

Paper
Code

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

2 code implementations • 23 Jun 2020 • Francesco Croce, Maksym Andriushchenko, Naman D. Singh, Nicolas Flammarion, Matthias Hein

We propose a versatile framework based on random search, Sparse-RS, for score-based sparse targeted and untargeted attacks in the black-box setting.

Malware Detection

142

Paper
Code

On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines

2 code implementations • ICLR 2021 • Marius Mosbach, Maksym Andriushchenko, Dietrich Klakow

Fine-tuning pre-trained transformer-based language models such as BERT has become a common practice dominating leaderboards across various NLP benchmarks.

Misconceptions

129

Paper
Code

Square Attack: a query-efficient black-box adversarial attack via random search

1 code implementation • ECCV 2020 • Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, Matthias Hein

We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking.

Adversarial Attack

142

Paper
Code

Provably Robust Boosted Decision Stumps and Trees against Adversarial Attacks

1 code implementation • NeurIPS 2019 • Maksym Andriushchenko, Matthias Hein

The problem of adversarial robustness has been studied extensively for neural networks.

Adversarial Robustness

Paper
Code

Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem

1 code implementation • CVPR 2019 • Matthias Hein, Maksym Andriushchenko, Julian Bitterwolf

We show that this technique is surprisingly effective in reducing the confidence of predictions far away from the training data while maintaining high confidence predictions and test error on the original classification task compared to standard training.

General Classification

182

Paper
Code

Logit Pairing Methods Can Fool Gradient-Based Attacks

1 code implementation • 29 Oct 2018 • Marius Mosbach, Maksym Andriushchenko, Thomas Trost, Matthias Hein, Dietrich Klakow

Recently, Kannan et al. [2018] proposed several logit regularization methods to improve the adversarial robustness of classifiers.

Adversarial Robustness

Paper
Code

Provable Robustness of ReLU networks via Maximization of Linear Regions

2 code implementations • 17 Oct 2018 • Francesco Croce, Maksym Andriushchenko, Matthias Hein

It has been shown that neural network classifiers are not robust.

Paper
Code

Formal Guarantees on the Robustness of a Classifier against Adversarial Manipulation

no code implementations • NeurIPS 2017 • Matthias Hein, Maksym Andriushchenko

We show in this paper for the first time formal guarantees on the robustness of a classifier by giving instance-specific lower bounds on the norm of the input manipulation required to change the classifier decision.

General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.