Search Results for author: Francesco Croce

Found 26 papers, 21 papers with code

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

1 code implementation • 2 Apr 2024 • Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks.

Paper
Code

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

1 code implementation • 28 Mar 2024 • Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) a new jailbreaking dataset containing 100 unique behaviors, which we call JBB-Behaviors; (2) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.

Paper
Code

Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models

1 code implementation • 19 Feb 2024 • Christian Schlarmann, Naman Deep Singh, Francesco Croce, Matthias Hein

The CLIP model, or one of its variants, is used as a frozen vision encoder in many vision-language models (VLMs), e. g. LLaVA and OpenFlamingo.

Adversarial Defense Multimodal Deep Learning +1

Paper
Code

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

1 code implementation • 7 Feb 2024 • Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

There is a consensus that instruction fine-tuning of LLMs requires high-quality data, but what are they?

Paper
Code

Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on Segmentation Models

no code implementations • 24 Nov 2023 • Francesco Croce, Matthias Hein

General purpose segmentation models are able to generate (semantic) segmentation masks from a variety of prompts, including visual (points, boxed, etc.)

Segmentation Semantic Segmentation

Paper
Add Code

Robust Semantic Segmentation: Strong Adversarial Attacks and Fast Training of Robust Models

1 code implementation • 22 Jun 2023 • Francesco Croce, Naman D Singh, Matthias Hein

While a large amount of work has focused on designing adversarial attacks against image classifiers, only a few methods exist to attack semantic segmentation models.

Image Classification Segmentation +1

Paper
Code

Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

1 code implementation • NeurIPS 2023 • Naman D Singh, Francesco Croce, Matthias Hein

While adversarial training has been extensively studied for ResNet architectures and low resolution datasets like CIFAR, much less is known for ImageNet.

Paper
Code

Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts

no code implementations • CVPR 2023 • Francesco Croce, Sylvestre-Alvise Rebuffi, Evan Shelhamer, Sven Gowal

Adversarial training is widely used to make classifiers robust to a specific threat or adversary, such as $\ell_p$-norm bounded perturbations of a given $p$-norm.

Paper
Add Code

A Modern Look at the Relationship between Sharpness and Generalization

1 code implementation • 14 Feb 2023 • Maksym Andriushchenko, Francesco Croce, Maximilian Müller, Matthias Hein, Nicolas Flammarion

Overall, we observe that sharpness does not correlate well with generalization but rather with some training parameters like the learning rate that can be positively or negatively correlated with generalization depending on the setup.

Paper
Code

Diffusion Visual Counterfactual Explanations

1 code implementation • 21 Oct 2022 • Maximilian Augustin, Valentyn Boreiko, Francesco Croce, Matthias Hein

Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification.

counterfactual Image Classification

Paper
Code

Revisiting adapters with adversarial training

no code implementations • 10 Oct 2022 • Sylvestre-Alvise Rebuffi, Francesco Croce, Sven Gowal

By co-training a neural network on clean and adversarial inputs, it is possible to improve classification accuracy on the clean, non-adversarial inputs.

Paper
Add Code

On the interplay of adversarial robustness and architecture components: patches, convolution and attention

no code implementations • 14 Sep 2022 • Francesco Croce, Matthias Hein

In recent years novel architecture components for image classification have been developed, starting with attention and patches used in transformers.

Adversarial Robustness Image Classification

Paper
Add Code

Sparse Visual Counterfactual Explanations in Image Space

1 code implementation • 16 May 2022 • Valentyn Boreiko, Maximilian Augustin, Francesco Croce, Philipp Berens, Matthias Hein

Visual counterfactual explanations (VCEs) in image space are an important tool to understand decisions of image classifiers as they show under which changes of the image the decision of the classifier would change.

counterfactual

Paper
Code

Evaluating the Adversarial Robustness of Adaptive Test-time Defenses

1 code implementation • 28 Feb 2022 • Francesco Croce, Sven Gowal, Thomas Brunner, Evan Shelhamer, Matthias Hein, Taylan Cemgil

Adaptive defenses, which optimize at test time, promise to improve adversarial robustness.

Adversarial Robustness Image Classification

Paper
Code

Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers

1 code implementation • 26 May 2021 • Francesco Croce, Matthias Hein

In this way we get the first multiple-norm robust model for ImageNet and boost the state-of-the-art for multiple-norm robustness to more than $51\%$ on CIFAR-10.

Adversarial Robustness

Paper
Code

Mind the box: $l_1$-APGD for sparse adversarial attacks on image classifiers

2 code implementations • 1 Mar 2021 • Francesco Croce, Matthias Hein

Finally, we combine $l_1$-APGD and an adaptation of the Square Attack to $l_1$ into $l_1$-AutoAttack, an ensemble of attacks which reliably assesses adversarial robustness for the threat model of $l_1$-ball intersected with $[0, 1]^d$.

Adversarial Robustness

607

Paper
Code

RobustBench: a standardized adversarial robustness benchmark

1 code implementation • 19 Oct 2020 • Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models.

Adversarial Robustness Benchmarking +3

596

Paper
Code

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

2 code implementations • 23 Jun 2020 • Francesco Croce, Maksym Andriushchenko, Naman D. Singh, Nicolas Flammarion, Matthias Hein

We propose a versatile framework based on random search, Sparse-RS, for score-based sparse targeted and untargeted attacks in the black-box setting.

Malware Detection

141

Paper
Code

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

10 code implementations • ICML 2020 • Francesco Croce, Matthias Hein

The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness.

Adversarial Robustness

607

Paper
Code

Square Attack: a query-efficient black-box adversarial attack via random search

1 code implementation • ECCV 2020 • Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, Matthias Hein

We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking.

Adversarial Attack

141

Paper
Code

Sparse and Imperceivable Adversarial Attacks

1 code implementation • ICCV 2019 • Francesco Croce, Matthias Hein

On the other hand the pixelwise perturbations of sparse attacks are typically large and thus can be potentially detected.

Paper
Code

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

2 code implementations • ICML 2020 • Francesco Croce, Matthias Hein

The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks.

Adversarial Attack

133

Paper
Code

Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$

1 code implementation • ICLR 2020 • Francesco Croce, Matthias Hein

In recent years several adversarial attacks and defenses have been proposed.

Paper
Code

Scaling up the randomized gradient-free adversarial attack reveals overestimation of robustness using established attacks

1 code implementation • 27 Mar 2019 • Francesco Croce, Jonas Rauber, Matthias Hein

Modern neural networks are highly non-robust against adversarial manipulation.

Adversarial Attack

Paper
Code

A randomized gradient-free attack on ReLU networks

no code implementations • 28 Nov 2018 • Francesco Croce, Matthias Hein

Relatively fast heuristics have been proposed to produce these adversarial inputs but the problem of finding the optimal adversarial input, that is with the minimal change of the input, is NP-hard.

Object Recognition

Paper
Add Code

Provable Robustness of ReLU networks via Maximization of Linear Regions

2 code implementations • 17 Oct 2018 • Francesco Croce, Maksym Andriushchenko, Matthias Hein

It has been shown that neural network classifiers are not robust.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.