no code implementations • 7 Apr 2024 • Mohamed El Amine Seddik, Suei-Wen Chen, Soufiane Hayou, Pierre Youssef, Merouane Debbah
With the aim of rigorously understanding model collapse in language models, we consider in this paper a statistical model that allows us to characterize the impact of various recursive training scenarios.
1 code implementation • 19 Feb 2024 • Soufiane Hayou, Nikhil Ghosh, Bin Yu
In this paper, we show that Low Rank Adaptation (LoRA) as originally introduced in Hu et al. (2021) leads to suboptimal finetuning of models with large width (embedding dimension).
no code implementations • 3 Oct 2023 • Greg Yang, Dingli Yu, Chen Zhu, Soufiane Hayou
By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $\mu$P, for *widthwise hyperparameter transfer*, i. e., predicting optimal hyperparameters of wide neural networks from narrow ones.
no code implementations • 2 Oct 2023 • Soufiane Hayou
Our aim is to understand the behaviour of neural functions (functions that depend on a neural network model) as width and depth go to infinity (in some sense), and eventually identify settings under which commutativity holds, i. e. the neural function tends to the same limit no matter how width and depth limits are taken.
1 code implementation • 29 Sep 2023 • Jiayuan Ye, Anastasia Borovykh, Soufiane Hayou, Reza Shokri
We introduce an analytical framework to quantify the changes in a machine learning algorithm's output distribution following the inclusion of a few data points in its training set, a notion we define as leave-one-out distinguishability (LOOD).
no code implementations • 17 Sep 2023 • Soufiane Hayou
In this note, we revisit and extend an old analytic criterion of the RH known as the Nyman-Beurling criterion which connects the RH to a minimization problem that involves a special class of neural networks.
no code implementations • 14 Feb 2023 • Fadhel Ayed, Soufiane Hayou
Data pruning algorithms are commonly used to reduce the memory and computational cost of the optimization process.
no code implementations • 1 Feb 2023 • Soufiane Hayou, Greg Yang
We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken.
no code implementations • 3 Oct 2022 • Soufiane Hayou
Unlike the infinite-width limit where the pre-activation converge weakly to a Gaussian random variable, we show that the infinite-depth limit yields different distributions depending on the choice of the activation function.
no code implementations • 22 Feb 2022 • Fusheng Liu, Haizhao Yang, Soufiane Hayou, Qianxiao Li
Optimization and generalization are two essential aspects of statistical machine learning.
no code implementations • 22 Oct 2021 • Soufiane Hayou, Bobby He, Gintare Karolina Dziugaite
In the linear model, we show that a PAC-Bayes generalization error bound is controlled by the magnitude of the change in feature alignment between the 'prior' and 'posterior' data.
no code implementations • 22 Oct 2021 • Yizhang Lou, Chris Mingard, Yoonsoo Nam, Soufiane Hayou
Recent work by Baratin et al. (2021) sheds light on an intriguing pattern that occurs during the training of deep neural networks: some layers align much more with data compared to other layers (where the alignment is defined as the euclidean product of the tangent features matrix and the data labels matrix).
no code implementations • NeurIPS Workshop ICBINB 2021 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau
Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK).
no code implementations • NeurIPS 2021 • Soufiane Hayou, Fadhel Ayed
Regularization plays a major role in modern deep learning.
no code implementations • 24 Oct 2020 • Soufiane Hayou, Eugenio Clerico, Bobby He, George Deligiannidis, Arnaud Doucet, Judith Rousseau
Deep ResNet architectures have achieved state of the art performance on many tasks.
no code implementations • ICLR 2021 • Soufiane Hayou, Jean-Francois Ton, Arnaud Doucet, Yee Whye Teh
Overparameterized Neural Networks (NN) display state-of-the-art performance.
no code implementations • 31 May 2019 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau
Recent work by Jacot et al. (2018) has shown that training a neural network of any kind with gradient descent in parameter space is strongly related to kernel gradient descent in function space with respect to the Neural Tangent Kernel (NTK).
no code implementations • 19 Feb 2019 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau
The weight initialization and the activation function of deep neural networks have a crucial impact on the performance of the training procedure.
no code implementations • ICLR 2019 • Soufiane Hayou, Arnaud Doucet, Judith Rousseau
We complete this analysis by providing quantitative results showing that, for a class of ReLU-like activation functions, the information propagates indeed deeper for an initialization at the edge of chaos.