Stochastic algorithms under single spiked models

17 May 2019  ·  Emile Richard ·

We study SGD and Adam for estimating a rank one signal planted in matrix or tensor noise. The extreme simplicity of the problem setup allows us to isolate the effects of various factors: signal to noise ratio, density of critical points, stochasticity and initialization. We observe a surprising phenomenon: Adam seems to get stuck in local minima as soon as polynomially many critical points appear (matrix case), while SGD escapes those. However, when the number of critical points degenerates to exponentials (tensor case), then both algorithms get trapped. Theory tells us that at fixed SNR the problem becomes intractable for large $d$ and in our experiments SGD does not escape this. We exhibit the benefits of warm starting in those situations. We conclude that in this class of problems, warm starting cannot be replaced by stochasticity in gradients to find the basin of attraction.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here