no code implementations • 24 Mar 2023 • Mahdi Soltanolkotabi, Dominik Stöger, Changzhi Xie
We show that in this setting, factorized gradient descent enjoys two implicit properties: (1) coupling of the trajectory of gradient descent where the factors are coupled in various ways throughout the gradient update trajectory and (2) an algorithmic regularization property where the iterates show a propensity towards low-rank models despite the overparameterized nature of the factorized model.