no code implementations • 9 May 2023 • Hung-Hsu Chou, Holger Rauhut, Rachel Ward
By analyzing key invariants of the gradient flow and using Lojasiewicz Theorem, we show that weight normalization also has an implicit bias towards sparse solutions in the diagonal linear model, but that in contrast to plain gradient flow, weight normalization enables a robust bias that persists even if the weights are initialized at practically large scale.
no code implementations • 21 Dec 2021 • Hung-Hsu Chou, Johannes Maly, Holger Rauhut
In deep learning it is common to overparameterize neural networks, that is, to use more parameters than training samples.
no code implementations • 27 Nov 2020 • Hung-Hsu Chou, Carsten Gieshoff, Johannes Maly, Holger Rauhut
This suggests that deep learning prefers trajectories whose complexity (measuredin terms of effective rank) is monotonically increasing, which we believe is a fundamental concept for thetheoretical understanding of deep learning.
no code implementations • 15 Jun 2020 • Yuege Xie, Hung-Hsu Chou, Holger Rauhut, Rachel Ward
Motivated by surprisingly good generalization properties of learned deep neural networks in overparameterized scenarios and by the related double descent phenomenon, this paper analyzes the relation between smoothness and low generalization error in an overparameterized linear learning problem.