Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\mathsf{opt} < 1$ be the population loss of the best-fitting ReLU. We prove: 1. Finding a ReLU with square-loss $\mathsf{opt} + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply -{\emph unconditionally}- that gradient descent cannot converge to the global minimum in polynomial time. 2. There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error $O(\mathsf{opt}^{2/3})$. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss. Prior work due to Soltanolkotabi [Sol17] showed that gradient descent can find the best-fitting ReLU with respect to Gaussian marginals, if the training set is exactly labeled by a ReLU.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods