Search Results for author: Nicolas Flammarion

Found 48 papers, 17 papers with code

Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

no code implementations8 Mar 2024 Hristo Papazov, Scott Pesme, Nicolas Flammarion

In this work, we investigate the effect of momentum on the optimisation trajectory of gradient descent.

Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

1 code implementation7 Feb 2024 Hao Zhao, Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion

There is a consensus that instruction fine-tuning of LLMs requires high-quality data, but what are they?

Early alignment in two-layer networks training is a two-edged sword

1 code implementation19 Jan 2024 Etienne Boursier, Nicolas Flammarion

Training neural networks with first order optimisation methods is at the core of the empirical success of deep learning.

Why Do We Need Weight Decay in Modern Deep Learning?

1 code implementation6 Oct 2023 Maksym Andriushchenko, Francesco D'Angelo, Aditya Varre, Nicolas Flammarion

In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory.

Learning Theory Stochastic Optimization

Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

1 code implementation NeurIPS 2023 Klim Kireev, Maksym Andriushchenko, Carmela Troncoso, Nicolas Flammarion

We present a method that allows us to train adversarially robust deep networks for tabular data and to transfer this robustness to other classifiers via universal robust embeddings tailored to categorical data.

Adversarial Robustness Fraud Detection +2

First-order ANIL learns linear representations despite misspecified latent dimension

no code implementations2 Mar 2023 Oğuz Kaan Yuksel, Etienne Boursier, Nicolas Flammarion

In particular, model-agnostic methods look for initialisation points from which gradient descent quickly adapts to any new task.

Meta-Learning

Linearization Algorithms for Fully Composite Optimization

no code implementations24 Feb 2023 Maria-Luiza Vladarean, Nikita Doikov, Martin Jaggi, Nicolas Flammarion

This paper studies first-order algorithms for solving fully composite optimization problems over convex and compact sets.

(S)GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

no code implementations17 Feb 2023 Mathieu Even, Scott Pesme, Suriya Gunasekar, Nicolas Flammarion

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over diagonal linear networks.

regression

A Modern Look at the Relationship between Sharpness and Generalization

1 code implementation14 Feb 2023 Maksym Andriushchenko, Francesco Croce, Maximilian Müller, Matthias Hein, Nicolas Flammarion

Overall, we observe that sharpness does not correlate well with generalization but rather with some training parameters like the learning rate that can be positively or negatively correlated with generalization depending on the setup.

SGD with Large Step Sizes Learns Sparse Features

1 code implementation11 Oct 2022 Maksym Andriushchenko, Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

We present empirical observations that commonly used large step sizes (i) lead the iterates to jump from one side of a valley to the other causing loss stabilization, and (ii) this stabilization induces a hidden stochastic dynamics orthogonal to the bouncing directions that biases it implicitly toward sparse predictors.

Label noise (stochastic) gradient descent implicitly solves the Lasso for quadratic parametrisation

no code implementations20 Jun 2022 Loucas Pillaud-Vivien, Julien Reygner, Nicolas Flammarion

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks.

Towards Understanding Sharpness-Aware Minimization

1 code implementation13 Jun 2022 Maksym Andriushchenko, Nicolas Flammarion

We further study the properties of the implicit bias on non-linear networks empirically, where we show that fine-tuning a standard model with SAM can lead to significant generalization improvements.

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

1 code implementation2 Jun 2022 Etienne Boursier, Loucas Pillaud-Vivien, Nicolas Flammarion

The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution.

Accelerated SGD for Non-Strongly-Convex Least Squares

no code implementations3 Mar 2022 Aditya Varre, Nicolas Flammarion

We consider stochastic approximation for the least squares regression problem in the non-strongly convex setting.

regression

Trace norm regularization for multi-task learning with scarce data

1 code implementation14 Feb 2022 Etienne Boursier, Mikhail Konobeev, Nicolas Flammarion

Multi-task learning leverages structural similarities between multiple tasks to learn despite very few samples.

Meta-Learning Multi-Task Learning

Sequential Algorithms for Testing Closeness of Distributions

no code implementations NeurIPS 2021 Aadil Oufkir, Omar Fawzi, Nicolas Flammarion, Aurélien Garivier

For a general alphabet size $n$, we give a sequential algorithm that uses no more samples than its batch counterpart, and possibly fewer if the actual distance between $\mathcal{D}_1$ and $\mathcal{D}_2$ is larger than $\epsilon$.

Linear Speedup in Personalized Collaborative Learning

1 code implementation10 Nov 2021 El Mahdi Chayti, Sai Praneeth Karimireddy, Sebastian U. Stich, Nicolas Flammarion, Martin Jaggi

Collaborative training can improve the accuracy of a model for a user by trading off the model's bias (introduced by using data from other users who are potentially different) against its variance (due to the limited amount of data on any single user).

Federated Learning Stochastic Optimization

Understanding Sharpness-Aware Minimization

no code implementations29 Sep 2021 Maksym Andriushchenko, Nicolas Flammarion

Next, we discuss why SAM can be helpful in the noisy label setting where we first show that it can help to improve generalization even for linear classifiers.

Learning with noisy labels

Implicit Bias of SGD for Diagonal Linear Networks: a Provable Benefit of Stochasticity

no code implementations NeurIPS 2021 Scott Pesme, Loucas Pillaud-Vivien, Nicolas Flammarion

Understanding the implicit bias of training algorithms is of crucial importance in order to explain the success of overparametrised neural networks.

A Continuized View on Nesterov Acceleration for Stochastic Gradient Descent and Randomized Gossip

1 code implementation10 Jun 2021 Mathieu Even, Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Hadrien Hendrikx, Laurent Massoulié, Adrien Taylor

We introduce the continuized Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.

Last iterate convergence of SGD for Least-Squares in the Interpolation regime.

no code implementations NeurIPS 2021 Aditya Vardhan Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

Motivated by the recent successes of neural networks that have the ability to fit the data perfectly \emph{and} generalize well, we study the noiseless model in the fundamental least-squares setup.

Stochastic Optimization

On the effectiveness of adversarial training against common corruptions

1 code implementation3 Mar 2021 Klim Kireev, Maksym Andriushchenko, Nicolas Flammarion

First, we show that, when used with an appropriately selected perturbation radius, $\ell_p$ adversarial training can serve as a strong baseline against common corruptions improving both accuracy and calibration.

Data Augmentation

A Continuized View on Nesterov Acceleration

no code implementations11 Feb 2021 Raphaël Berthier, Francis Bach, Nicolas Flammarion, Pierre Gaillard, Adrien Taylor

We introduce the "continuized" Nesterov acceleration, a close variant of Nesterov acceleration whose variables are indexed by a continuous time parameter.

Distributed, Parallel, and Cluster Computing Optimization and Control

Last iterate convergence of SGD for Least-Squares in the Interpolation regime

no code implementations NeurIPS 2021 Aditya Varre, Loucas Pillaud-Vivien, Nicolas Flammarion

Motivated by the recent successes of neural networks that have the ability to fit the data perfectly and generalize well, we study the noiseless model in the fundamental least-squares setup.

Stochastic Optimization

RobustBench: a standardized adversarial robustness benchmark

1 code implementation19 Oct 2020 Francesco Croce, Maksym Andriushchenko, Vikash Sehwag, Edoardo Debenedetti, Nicolas Flammarion, Mung Chiang, Prateek Mittal, Matthias Hein

As a research community, we are still lacking a systematic understanding of the progress on adversarial robustness which often makes it hard to identify the most promising ideas in training robust models.

Adversarial Robustness Benchmarking +3

Optimal Robust Linear Regression in Nearly Linear Time

no code implementations16 Jul 2020 Yeshwanth Cherapanamjeri, Efe Aras, Nilesh Tripuraneni, Michael. I. Jordan, Nicolas Flammarion, Peter L. Bartlett

We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = \langle X, w^* \rangle + \epsilon$ (with $X \in \mathbb{R}^d$ and $\epsilon$ independent), in which an $\eta$ fraction of the samples have been adversarially corrupted.

regression

Understanding and Improving Fast Adversarial Training

1 code implementation NeurIPS 2020 Maksym Andriushchenko, Nicolas Flammarion

We show that adding a random step to FGSM, as proposed in Wong et al. (2020), does not prevent catastrophic overfitting, and that randomness is not important per se -- its main role being simply to reduce the magnitude of the perturbation.

On Convergence-Diagnostic based Step Sizes for Stochastic Gradient Descent

no code implementations ICML 2020 Scott Pesme, Aymeric Dieuleveut, Nicolas Flammarion

Constant step-size Stochastic Gradient Descent exhibits two phases: a transient phase during which iterates make fast progress towards the optimum, followed by a stationary phase during which iterates oscillate around the optimal point.

Online Robust Regression via SGD on the l1 loss

no code implementations NeurIPS 2020 Scott Pesme, Nicolas Flammarion

We consider the robust linear regression problem in the online setting where we have access to the data in a streaming manner, one data point after the other.

regression

Sparse-RS: a versatile framework for query-efficient sparse black-box adversarial attacks

2 code implementations23 Jun 2020 Francesco Croce, Maksym Andriushchenko, Naman D. Singh, Nicolas Flammarion, Matthias Hein

We propose a versatile framework based on random search, Sparse-RS, for score-based sparse targeted and untargeted attacks in the black-box setting.

Malware Detection

Square Attack: a query-efficient black-box adversarial attack via random search

1 code implementation ECCV 2020 Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, Matthias Hein

We propose the Square Attack, a score-based black-box $l_2$- and $l_\infty$-adversarial attack that does not rely on local gradient information and thus is not affected by gradient masking.

Adversarial Attack

An Efficient Sampling Algorithm for Non-smooth Composite Potentials

no code implementations1 Oct 2019 Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

We consider the problem of sampling from a density of the form $p(x) \propto \exp(-f(x)- g(x))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$ is a smooth and strongly convex function and $g: \mathbb{R}^d \rightarrow \mathbb{R}$ is a convex and Lipschitz function.

Escaping from saddle points on Riemannian manifolds

no code implementations NeurIPS 2019 Yue Sun, Nicolas Flammarion, Maryam Fazel

We consider minimizing a nonconvex, smooth function $f$ on a Riemannian manifold $\mathcal{M}$.

Fast Mean Estimation with Sub-Gaussian Rates

1 code implementation6 Feb 2019 Yeshwanth Cherapanamjeri, Nicolas Flammarion, Peter L. Bartlett

We propose an estimator for the mean of a random vector in $\mathbb{R}^d$ that can be computed in time $O(n^4+n^2d)$ for $n$ i. i. d.~samples and that has error bounds matching the sub-Gaussian case.

Is There an Analog of Nesterov Acceleration for MCMC?

no code implementations4 Feb 2019 Yi-An Ma, Niladri Chatterji, Xiang Cheng, Nicolas Flammarion, Peter Bartlett, Michael. I. Jordan

We formulate gradient-based Markov chain Monte Carlo (MCMC) sampling as optimization on the space of probability measures, with Kullback-Leibler (KL) divergence as the objective functional.

Gen-Oja: Simple & Efficient Algorithm for Streaming Generalized Eigenvector Computation

no code implementations NeurIPS 2018 Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principle Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

Gen-Oja: A Two-time-scale approach for Streaming CCA

no code implementations20 Nov 2018 Kush Bhatia, Aldo Pacchiano, Nicolas Flammarion, Peter L. Bartlett, Michael. I. Jordan

In this paper, we study the problems of principal Generalized Eigenvector computation and Canonical Correlation Analysis in the stochastic setting.

Vocal Bursts Valence Prediction

Sampling Can Be Faster Than Optimization

no code implementations20 Nov 2018 Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, Michael. I. Jordan

Optimization algorithms and Monte Carlo sampling algorithms have provided the computational foundations for the rapid growth in applications of statistical machine learning in recent years.

Averaging Stochastic Gradient Descent on Riemannian Manifolds

no code implementations26 Feb 2018 Nilesh Tripuraneni, Nicolas Flammarion, Francis Bach, Michael. I. Jordan

We consider the minimization of a function defined on a Riemannian manifold $\mathcal{M}$ accessible only through unbiased estimates of its gradients.

Riemannian optimization

On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo

no code implementations ICML 2018 Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael. I. Jordan

We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion.

Stochastic Composite Least-Squares Regression with convergence rate O(1/n)

no code implementations21 Feb 2017 Nicolas Flammarion, Francis Bach

We consider the minimization of composite objective functions composed of the expectation of quadratic functions and an arbitrary convex function.

regression

Robust Discriminative Clustering with Sparse Regularizers

no code implementations29 Aug 2016 Nicolas Flammarion, Balamurugan Palaniappan, Francis Bach

Clustering high-dimensional data often requires some form of dimensionality reduction, where clustered variables are separated from "noise-looking" variables.

Clustering Dimensionality Reduction

Optimal Rates of Statistical Seriation

no code implementations8 Jul 2016 Nicolas Flammarion, Cheng Mao, Philippe Rigollet

Given a matrix the seriation problem consists in permuting its rows in such way that all its columns have the same shape, for example, they are monotone increasing.

Denoising

Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression

no code implementations17 Feb 2016 Aymeric Dieuleveut, Nicolas Flammarion, Francis Bach

We consider the optimization of a quadratic objective function whose gradients are only accessible through a stochastic oracle that returns the gradient at any given point plus a zero-mean finite variance random error.

regression

From Averaging to Acceleration, There is Only a Step-size

no code implementations7 Apr 2015 Nicolas Flammarion, Francis Bach

We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for non-strongly-convex problems may be reformulated as constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n 2), where n is the number of iterations.

Cannot find the paper you are looking for? You can Submit a new open access paper.