1 code implementation • 11 Jun 2020 • Eugene. A. Golikov
We propose a general framework to study how the limit behavior of neural models depends on the scaling of hyperparameters with network width.
1 code implementation • ICML 2020 • Eugene. A. Golikov
We show that for networks with more than two hidden layers RMSProp training has a non-trivial discrete-time MF limit but GD training does not have one.
no code implementations • 13 Nov 2019 • Biswarup Das, Eugene. A. Golikov
We prove that if an activation function satisfies some mild conditions and number of neurons in a two-layered fully connected neural network with this activation function is beyond a certain threshold, then gradient descent on quadratic loss function finds the optimal weights of input layer for global minima in linear time.