no code implementations • 30 Dec 2020 • Victor Luo, Yazhen Wang, Glenn Fung
In this paper, we seek to extend the mean field results of Mei et al. (2018) from two-layer neural networks with one hidden layer to three-layer neural networks with two hidden layers.
no code implementations • 24 Sep 2020 • Victor Luo, Yazhen Wang
The influencing factors identified in the literature include learning rate, batch size, Hessian, and gradient covariance, and stochastic differential equations are used to model SGD and establish the relationships among these factors for characterizing minima found by SGD.