Search Results for author: Samy Jelassi

Found 19 papers, 6 papers with code

Extra-gradient with player sampling for faster convergence in n-player games

no code implementations ICML 2020 Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

Q-Probe: A Lightweight Approach to Reward Maximization for Language Models

1 code implementation22 Feb 2024 Kenneth Li, Samy Jelassi, Hugh Zhang, Sham Kakade, Martin Wattenberg, David Brandfonbrener

The idea is to learn a simple linear function on a model's embedding space that can be used to reweight candidate completions.

Code Generation Language Modelling

Repeat After Me: Transformers are Better than State Space Models at Copying

1 code implementation1 Feb 2024 Samy Jelassi, David Brandfonbrener, Sham M. Kakade, Eran Malach

Empirically, we find that transformers outperform GSSMs in terms of efficiency and generalization on synthetic tasks that require copying the context.

Length Generalization in Arithmetic Transformers

no code implementations27 Jun 2023 Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton

We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums.

Position

Depth Dependence of $μ$P Learning Rates in ReLU MLPs

no code implementations13 May 2023 Samy Jelassi, Boris Hanin, Ziwei Ji, Sashank J. Reddi, Srinadh Bhojanapalli, Sanjiv Kumar

In this short note we consider random fully connected ReLU networks of width $n$ and depth $L$ equipped with a mean-field weight initialization.

Vision Transformers provably learn spatial structure

no code implementations13 Oct 2022 Samy Jelassi, Michael E. Sander, Yuanzhi Li

On the theoretical side, we consider a binary classification task and show that while the learning problem admits multiple solutions that generalize, our model implicitly learns the spatial structure of the dataset while generalizing: we call this phenomenon patch association.

Binary Classification Inductive Bias

Dissecting adaptive methods in GANs

no code implementations9 Oct 2022 Samy Jelassi, David Dobre, Arthur Mensch, Yuanzhi Li, Gauthier Gidel

By considering an update rule with the magnitude of the Adam update and the normalized direction of SGD, we empirically show that the adaptive magnitude of Adam is key for GAN training.

Towards understanding how momentum improves generalization in deep learning

no code implementations13 Jul 2022 Samy Jelassi, Yuanzhi Li

Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures.

Binary Classification

Adam is no better than normalized SGD: Dissecting how adaptivity improves GAN performance

no code implementations29 Sep 2021 Samy Jelassi, Arthur Mensch, Gauthier Gidel, Yuanzhi Li

We empirically show that SGDA with the same vector norm as Adam reaches similar or even better performance than the latter.

Depth separation beyond radial functions

no code implementations2 Feb 2021 Luca Venturi, Samy Jelassi, Tristan Ozuch, Joan Bruna

The first contribution of this paper is to extend such results to a more general class of functions, namely functions with piece-wise oscillatory structure, by building on the proof strategy of (Eldan and Shamir, 2016).

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

no code implementations20 Oct 2020 Samy Jelassi, Aaron Defazio

First-order stochastic optimization methods are currently the most widely used class of methods for training deep neural networks.

Stochastic Optimization

Auction learning as a two-player game

no code implementations ICLR 2021 Jad Rahme, Samy Jelassi, S. Matthew Weinberg

This not only circumvents the need for an expensive hyper-parameter search (as in prior work), but also provides a principled metric to compare the performance of two auctions (absent from prior work).

Vocal Bursts Valence Prediction

A Permutation-Equivariant Neural Network Architecture For Auction Design

1 code implementation2 Mar 2020 Jad Rahme, Samy Jelassi, Joan Bruna, S. Matthew Weinberg

Designing an incentive compatible auction that maximizes expected revenue is a central problem in Auction Design.

Towards closing the gap between the theory and practice of SVRG

1 code implementation NeurIPS 2019 Othmane Sebbouh, Nidham Gazagnadou, Samy Jelassi, Francis Bach, Robert M. Gower

Among the very first variance reduced stochastic methods for solving the empirical risk minimization problem was the SVRG method (Johnson & Zhang 2013).

Extragradient with player sampling for faster Nash equilibrium finding

1 code implementation29 May 2019 Carles Domingo Enrich, Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

Global convergence of neuron birth-death dynamics

no code implementations5 Feb 2019 Grant Rotskoff, Samy Jelassi, Joan Bruna, Eric Vanden-Eijnden

Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of "overparameterized" models.

Smoothed analysis of the low-rank approach for smooth semidefinite programs

no code implementations NeurIPS 2018 Thomas Pumir, Samy Jelassi, Nicolas Boumal

In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable.

Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.