1 code implementation • NeurIPS 2023 • Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli
In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization.
no code implementations • 7 Oct 2022 • Daniel Kunin, Atsushi Yamamura, Chao Ma, Surya Ganguli
We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics.