Search Results for author: Samy Jelassi

Found 19 papers, 6 papers with code

Extra-gradient with player sampling for faster convergence in n-player games

no code implementations • ICML 2020 • Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

Paper
Add Code

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

1 code implementation • 22 Feb 2024 • Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

Code Generation Language Modelling

Paper
Code

Repeat After Me: Transformers are Better than State Space Models at Copying

1 code implementation • 1 Feb 2024 • Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context.

164

Paper
Code

Length Generalization in Arithmetic Transformers

no code implementations • 27 Jun 2023 • Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton

We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums.

Position

Paper
Add Code

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

no code implementations • 13 May 2023 • Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

Paper
Add Code

Vision Transformers provably learn spatial structure

no code implementations • 13 Oct 2022 • Samy Jelassi, Michael E. Sander, Yuanzhi Li

On the theoretical side, we consider a binary classification task and show that while the learning problem admits multiple solutions that generalize, our model implicitly learns the spatial structure of the dataset while generalizing: we call this phenomenon patch association.

Binary Classification Inductive Bias

Paper
Add Code

Dissecting adaptive methods in GANs

no code implementations • 9 Oct 2022 • Samy Jelassi, David Dobre, Arthur Mensch, Yuanzhi Li, Gauthier Gidel

By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training.

Paper
Add Code

Towards understanding how momentum improves generalization in deep learning

no code implementations • 13 Jul 2022 • Samy Jelassi, Yuanzhi Li

Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures.

Binary Classification

Paper
Add Code

Adam is no better than normalized SGD: Dissecting how adaptivity improves GAN performance

no code implementations • 29 Sep 2021 • Samy Jelassi, Arthur Mensch, Gauthier Gidel, Yuanzhi Li

We empirically show that SGDA with the same vector norm as Adam reaches similar or even better performance than the latter.

Paper
Add Code

Depth separation beyond radial functions

no code implementations • 2 Feb 2021 • Luca Venturi, Samy Jelassi, Tristan Ozuch, Joan Bruna

The first contribution of this paper is to extend such results to a more general class of functions, namely functions with piece-wise oscillatory structure, by building on the proof strategy of (Eldan and Shamir, 2016).

Paper
Add Code

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

5 code implementations • 26 Jan 2021 • Aaron Defazio, Samy Jelassi

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods.

Stochastic Optimization

797

Paper
Code

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

no code implementations • 20 Oct 2020 • Samy Jelassi, Aaron Defazio

First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks.

Stochastic Optimization

Paper
Add Code

Auction learning as a two-player game

no code implementations • ICLR 2021 • Jad Rahme, Samy Jelassi, S. Matthew Weinberg

This not only circumvents the need for an expensive hyper-parameter search (as in prior work), but also provides a principled metric to compare the performance of two auctions (absent from prior work).

Vocal Bursts Valence Prediction

Paper
Add Code

A Permutation-Equivariant Neural Network Architecture For Auction Design

1 code implementation • 2 Mar 2020 • Jad Rahme, Samy Jelassi, Joan Bruna, S. Matthew Weinberg

Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design.

Paper
Code

A mean-field analysis of two-player zero-sum games

no code implementations • NeurIPS 2020 • Carles Domingo-Enrich, Samy Jelassi, Arthur Mensch, Grant Rotskoff, Joan Bruna

Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs.

Vocal Bursts Valence Prediction

Paper
Add Code

Towards closing the gap between the theory and practice of SVRG

1 code implementation • NeurIPS 2019 • Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower

Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013).

Paper
Code

Extragradient with player sampling for faster Nash equilibrium finding

1 code implementation • 29 May 2019 • Carles Domingo Enrich, Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

Paper
Code

Global convergence of neuron birth-death dynamics

no code implementations • 5 Feb 2019 • Grant Rotskoff, Samy Jelassi, Joan Bruna, Eric Vanden-Eijnden

Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of "overparameterized" models.

Paper
Add Code

Smoothed analysis of the low-rank approach for smooth semidefinite programs

no code implementations • NeurIPS 2018 • Thomas Pumir, Samy Jelassi, Nicolas Boumal

In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable.

Retrieval

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.