Search Results for author: Francesco Croce

Found 26 papers, 21 papers with code

Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks

1 code implementation2 Apr 2024 Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

We show that even the most recent safety-aligned LLMs are not robust to simple adaptive jailbreaking attacks.

In-Context Learning

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

1 code implementation28 Mar 2024 Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

To address these challenges, we introduce JailbreakBench, an open-sourced benchmark with the following components: (1) a new jailbreaking dataset containing 100 unique behaviors, which we call JBB-Behaviors; (2) an evolving repository of state-of-the-art adversarial prompts, which we refer to as jailbreak artifacts; (3) a standardized evaluation framework that includes a clearly defined threat model, system prompts, chat templates, and scoring functions; and (4) a leaderboard that tracks the performance of attacks and defenses for various LLMs.

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

1 code implementation7 Feb 2024 Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

There is a consensus that instruction fine-tuning of LLMs requires high-quality data, but what are they?

Segment (Almost) Nothing: Prompt-Agnostic Adversarial Attacks on Segmentation Models

no code implementations24 Nov 2023 Francesco Croce, Matthias Hein

General purpose segmentation models are able to generate (semantic) segmentation masks from a variety of prompts, including visual (points, boxed, etc.)

Segmentation Semantic Segmentation

Robust Semantic Segmentation: Strong Adversarial Attacks and Fast Training of Robust Models

1 code implementation22 Jun 2023 Francesco Croce, Naman D Singh, Matthias Hein

While a large amount of work has focused on designing adversarial attacks against image classifiers, only a few methods exist to attack semantic segmentation models.

Image Classification Segmentation +1

Revisiting Adversarial Training for ImageNet: Architectures, Training and Generalization across Threat Models

1 code implementation NeurIPS 2023 Naman D Singh, Francesco Croce, Matthias Hein

While adversarial training has been extensively studied for ResNet architectures and low resolution datasets like CIFAR, much less is known for ImageNet.

Seasoning Model Soups for Robustness to Adversarial and Natural Distribution Shifts

no code implementations CVPR 2023 Francesco Croce, Sylvestre-Alvise Rebuffi, Evan Shelhamer, Sven Gowal

Adversarial training is widely used to make classifiers robust to a specific threat or adversary, such as $\ell_p$-norm bounded perturbations of a given $p$-norm.

A Modern Look at the Relationship between Sharpness and Generalization

1 code implementation14 Feb 2023 Maksym Andriushchenko, Francesco Croce, Maximilian Müller, Matthias Hein, Nicolas Flammarion

Overall, we observe that sharpness does not correlate well with generalization but rather with some training parameters like the learning rate that can be positively or negatively correlated with generalization depending on the setup.

Diffusion Visual Counterfactual Explanations

1 code implementation21 Oct 2022 Maximilian Augustin, Valentyn Boreiko, Francesco Croce, Matthias Hein

Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification.

counterfactual Image Classification

Revisiting adapters with adversarial training

no code implementations10 Oct 2022 Sylvestre-Alvise Rebuffi, Francesco Croce, Sven Gowal

By co-training a neural network on clean and adversarial inputs, it is possible to improve classification accuracy on the clean, non-adversarial inputs.

On the interplay of adversarial robustness and architecture components: patches, convolution and attention

no code implementations14 Sep 2022 Francesco Croce, Matthias Hein

In recent years novel architecture components for image classification have been developed, starting with attention and patches used in transformers.

Adversarial Robustness Image Classification

Sparse Visual Counterfactual Explanations in Image Space

1 code implementation16 May 2022 Valentyn Boreiko, Maximilian Augustin, Francesco Croce, Philipp Berens, Matthias Hein

Visual counterfactual explanations (VCEs) in image space are an important tool to understand decisions of image classifiers as they show under which changes of the image the decision of the classifier would change.

counterfactual

Adversarial Robustness against Multiple and Single $l_p$-Threat Models via Quick Fine-Tuning of Robust Classifiers

1 code implementation26 May 2021 Francesco Croce, Matthias Hein

In this way we get the first multiple-norm robust model for ImageNet and boost the state-of-the-art for multiple-norm robustness to more than $51\%$ on CIFAR-10.

Adversarial Robustness

Mind the box: $l_1$-APGD for sparse adversarial attacks on image classifiers

2 code implementations1 Mar 2021 Francesco Croce, Matthias Hein

Finally, we combine $l_1$-APGD and an adaptation of the Square Attack to $l_1$ into $l_1$-AutoAttack, an ensemble of attacks which reliably assesses adversarial robustness for the threat model of $l_1$-ball intersected with $[0, 1]^d$.

Adversarial Robustness

RobustBench: a standardized adversarial robustness benchmark

1 code implementation19 Oct 2020 Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models.

Adversarial Robustness Benchmarking +3

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

2 code implementations23 Jun 2020 Francesco Croce, Maksym Andriushchenko, Naman D. Singh, Nicolas Flammarion, Matthias Hein

We propose a versatile framework based on random search, Sparse-RS, for score-based sparse targeted and untargeted attacks in the black-box setting.

Malware Detection

Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks

10 code implementations ICML 2020 Francesco Croce, Matthias Hein

The field of defense strategies against adversarial attacks has significantly grown over the last years, but progress is hampered as the evaluation of adversarial defenses is often insufficient and thus gives a wrong impression of robustness.

Adversarial Robustness

Square Attack: a query-efficient black-box adversarial attack via random search

1 code implementation ECCV 2020 Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, Matthias Hein

We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking.

Adversarial Attack

Sparse and Imperceivable Adversarial Attacks

1 code implementation ICCV 2019 Francesco Croce, Matthias Hein

On the other hand the pixelwise perturbations of sparse attacks are typically large and thus can be potentially detected.

Minimally distorted Adversarial Examples with a Fast Adaptive Boundary Attack

2 code implementations ICML 2020 Francesco Croce, Matthias Hein

The evaluation of robustness against adversarial manipulation of neural networks-based classifiers is mainly tested with empirical attacks as methods for the exact computation, even when available, do not scale to large networks.

Adversarial Attack

A randomized gradient-free attack on ReLU networks

no code implementations28 Nov 2018 Francesco Croce, Matthias Hein

Relatively fast heuristics have been proposed to produce these adversarial inputs but the problem of finding the optimal adversarial input, that is with the minimal change of the input, is NP-hard.

Object Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.