Search Results for author: Margalit Glasgow

Found 6 papers, 1 papers with code

SGD Finds then Tunes Features in Two-Layer Neural Networks with near-Optimal Sample Complexity: A Case Study in the XOR problem

no code implementations • 26 Sep 2023 • Margalit Glasgow

To our knowledge, this work is the first to give a sample complexity of $\tilde{O}(d)$ for efficiently learning the XOR function on isotropic data on a standard neural network with standard training.

Paper
Add Code

Tight Bounds for $γ$-Regret via the Decision-Estimation Coefficient

no code implementations • 6 Mar 2023 • Margalit Glasgow, Alexander Rakhlin

Our lower bound shows that the $\gamma$-DEC is a fundamental limit for any model class $\mathcal{F}$: for any algorithm, there exists some $f \in \mathcal{F}$ for which the $\gamma$-regret of that algorithm scales (nearly) with the $\gamma$-DEC of $\mathcal{F}$.

Paper
Add Code

Max-Margin Works while Large Margin Fails: Generalization without Uniform Convergence

no code implementations • 16 Jun 2022 • Margalit Glasgow, Colin Wei, Mary Wootters, Tengyu Ma

Nagarajan and Kolter (2019) show that in certain simple linear and neural-network settings, any uniform convergence bound will be vacuous, leaving open the question of how to prove generalization in settings where UC fails.

Generalization Bounds Memorization

Paper
Add Code

Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective

1 code implementation • 5 Nov 2021 • Margalit Glasgow, Honglin Yuan, Tengyu Ma

In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable.

Federated Learning

Paper
Code

Asynchronous Distributed Optimization with Stochastic Delays

no code implementations • 22 Sep 2020 • Margalit Glasgow, Mary Wootters

This complexity sits squarely between the complexity $\tilde{O}\left(\left(n + \kappa\right)\log(1/\epsilon)\right)$ of SAGA \textit{without delays} and the complexity $\tilde{O}\left(\left(n + m\kappa\right)\log(1/\epsilon)\right)$ of parallel asynchronous algorithms where the delays are \textit{arbitrary} (but bounded by $O(m)$), and the data is accessible by all.

Distributed Optimization

Paper
Add Code

Approximate Gradient Coding with Optimal Decoding

no code implementations • 17 Jun 2020 • Margalit Glasgow, Mary Wootters

Recent work has studied approximate gradient coding, which concerns coding schemes where the replication factor of the data is too low to recover the full gradient exactly.

Distributed Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.