no code implementations • 7 Mar 2023 • Li Jinli, Yuan Ye
However, such a model commonly encounters the problem of slow convergence because a standard SGD algorithm learns a latent factor relying on the stochastic gradient of current instance error only without considering past update information.