Search Results for author: Stéphane d'Ascoli

Found 19 papers, 13 papers with code

Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime

no code implementations ICML 2020 Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

We demonstrate that the latter two contributions are the crux of the double descent: they lead to the overfitting peak at the interpolation threshold and to the decay of the test error upon overparametrization.

ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

1 code implementation9 Oct 2023 Stéphane d'Ascoli, Sören Becker, Alexander Mathis, Philippe Schwaller, Niki Kilbertus

We introduce ODEFormer, the first transformer able to infer multidimensional ordinary differential equation (ODE) systems in symbolic form from the observation of a single solution trajectory.

regression Symbolic Regression

Boolformer: Symbolic Regression of Logic Functions with Transformers

1 code implementation21 Sep 2023 Stéphane d'Ascoli, Samy Bengio, Josh Susskind, Emmanuel Abbé

In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions.

Binary Classification regression +1

Length Generalization in Arithmetic Transformers

no code implementations27 Jun 2023 Samy Jelassi, Stéphane d'Ascoli, Carles Domingo-Enrich, Yuhuai Wu, Yuanzhi Li, François Charton

We find that relative position embeddings enable length generalization for simple tasks, such as addition: models trained on $5$-digit numbers can perform $15$-digit sums.

Position

End-to-end symbolic regression with transformers

3 code implementations22 Apr 2022 Pierre-Alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, François Charton

Symbolic regression, the task of predicting the mathematical expression of a function from the observation of its values, is a difficult task which usually involves a two-step procedure: predicting the "skeleton" of the expression up to the choice of numerical constants, then fitting the constants by optimizing a non-convex loss function.

regression Symbolic Regression

Optimal learning rate schedules in high-dimensional non-convex optimization problems

no code implementations9 Feb 2022 Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli

In this case, it is optimal to keep a large learning rate during the exploration phase to escape the non-convex region as quickly as possible, then use the convex criterion $\beta=1$ to converge rapidly to the solution.

Scheduling Vocal Bursts Intensity Prediction

Deep Symbolic Regression for Recurrent Sequences

no code implementations12 Jan 2022 Stéphane d'Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, François Charton

Symbolic regression, i. e. predicting a function from the observation of its values, is well-known to be a challenging task.

regression Symbolic Regression

Transformed CNNs: recasting pre-trained convolutional layers with self-attention

no code implementations10 Jun 2021 Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Ari Morcos

Finally, we experiment initializing the T-CNN from a partially trained CNN, and find that it reaches better performance than the corresponding hybrid model trained from scratch, while reducing training time.

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

9 code implementations19 Mar 2021 Stéphane d'Ascoli, Hugo Touvron, Matthew Leavitt, Ari Morcos, Giulio Biroli, Levent Sagun

We initialise the GPSA layers to mimic the locality of convolutional layers, then give each attention head the freedom to escape locality by adjusting a gating parameter regulating the attention paid to position versus content information.

Image Classification Inductive Bias

On the interplay between data structure and loss function in classification problems

1 code implementation NeurIPS 2021 Stéphane d'Ascoli, Marylou Gabrié, Levent Sagun, Giulio Biroli

One of the central puzzles in modern machine learning is the ability of heavily overparametrized models to generalize well.

valid

Align, then memorise: the dynamics of learning with feedback alignment

1 code implementation24 Nov 2020 Maria Refinetti, Stéphane d'Ascoli, Ruben Ohana, Sebastian Goldt

Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to the ubiquitous backpropagation algorithm for training deep neural networks.

Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems

1 code implementation3 Nov 2020 Stéphane d'Ascoli, Alice Coucke, Francesco Caltagirone, Alexandre Caulier, Marc Lelarge

Scarcity of training data for task-oriented dialogue systems is a well known problem that is usually tackled with costly and time-consuming manual data annotation.

Data Augmentation Language Modelling +2

Triple descent and the two kinds of overfitting: Where & why do they appear?

1 code implementation NeurIPS 2020 Stéphane d'Ascoli, Levent Sagun, Giulio Biroli

We show that this peak is implicitly regularized by the nonlinearity, which is why it only becomes salient at high noise and is weakly affected by explicit regularization.

regression

Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime

2 code implementations2 Mar 2020 Stéphane d'Ascoli, Maria Refinetti, Giulio Biroli, Florent Krzakala

We obtain a precise asymptotic expression for the bias-variance decomposition of the test error, and show that the bias displays a phase transition at the interpolation threshold, beyond which it remains constant.

Conditioned Query Generation for Task-Oriented Dialogue Systems

1 code implementation9 Nov 2019 Stéphane d'Ascoli, Alice Coucke, Francesco Caltagirone, Alexandre Caulier, Marc Lelarge

Scarcity of training data for task-oriented dialogue systems is a well known problem that is usually tackled with costly and time-consuming manual data annotation.

Task-Oriented Dialogue Systems Text Generation

A jamming transition from under- to over-parametrization affects loss landscape and generalization

no code implementations22 Oct 2018 Stefano Spigler, Mario Geiger, Stéphane d'Ascoli, Levent Sagun, Giulio Biroli, Matthieu Wyart

We argue that in fully-connected networks a phase transition delimits the over- and under-parametrized regimes where fitting can or cannot be achieved.

Cannot find the paper you are looking for? You can Submit a new open access paper.