no code implementations • 6 Mar 2024 • Aaron Mishkin, Ahmed Khaled, Yuanhao Wang, Aaron Defazio, Robert M. Gower
We develop new sub-optimality bounds for gradient descent (GD) that depend on the conditioning of the objective along the path of optimization, rather than on global, worst-case constants.
no code implementations • 12 Feb 2024 • Ahmed Khaled, Chi Jin
For the task of finding a stationary point of a smooth and potentially nonconvex function, we give a variant of SGD that matches the best-known high-probability convergence rate for tuned SGD at only an additional polylogarithmic cost.
1 code implementation • NeurIPS 2023 • Ahmed Khaled, Konstantin Mishchenko, Chi Jin
This paper proposes a new easy-to-implement parameter-free gradient-based optimizer: DoWG (Distance over Weighted Gradients).
no code implementations • 6 Sep 2022 • Ahmed Khaled, Chi Jin
Federated learning (FL) is a subfield of machine learning where multiple clients try to collaboratively learn a model over a network under communication constraints.
1 code implementation • 14 Jun 2022 • Abdurakhmon Sadiev, Grigory Malinovsky, Eduard Gorbunov, Igor Sokolov, Ahmed Khaled, Konstantin Burlachenko, Peter Richtárik
To reveal the true advantages of RR in the distributed learning with compression, we propose a new method called DIANA-RR that reduces the compression variance and has provably better convergence rates than existing counterparts with with-replacement sampling of stochastic gradients.
no code implementations • 22 Nov 2021 • Elnur Gasanov, Ahmed Khaled, Samuel Horváth, Peter Richtárik
A persistent problem in federated learning is that it is not clear what the optimization objective should be: the standard average risk minimization of supervised learning is inadequate in handling several major constraints specific to federated learning, such as communication adaptivity and personalization control.
1 code implementation • NeurIPS 2021 • Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik
Random Reshuffling (RR), also known as Stochastic Gradient Descent (SGD) without replacement, is a popular and theoretically grounded method for finite-sum minimization.
no code implementations • 20 Jun 2020 • Ahmed Khaled, Othmane Sebbouh, Nicolas Loizou, Robert M. Gower, Peter Richtárik
We showcase this by obtaining a simple formula for the optimal minibatch size of two variance reduced methods (\textit{L-SVRG} and \textit{SAGA}).
1 code implementation • NeurIPS 2020 • Konstantin Mishchenko, Ahmed Khaled, Peter Richtárik
from $\kappa$ to $\sqrt{\kappa}$) and, in addition, show that RR has a different type of variance.
no code implementations • 9 Feb 2020 • Ahmed Khaled, Peter Richtárik
Moreover, we perform our analysis in a framework which allows for a detailed study of the effects of a wide array of sampling strategies and minibatch sizes for finite-sum optimization problems.
no code implementations • 20 Dec 2019 • Sélim Chraibi, Ahmed Khaled, Dmitry Kovalev, Peter Richtárik, Adil Salim, Martin Takáč
We propose basic and natural assumptions under which iterative optimization methods with compressed iterates can be analyzed.
no code implementations • 10 Sep 2019 • Ahmed Khaled, Peter Richtárik
We propose and analyze a new type of stochastic first order method: gradient descent with compressed iterates (GDCI).
no code implementations • 10 Sep 2019 • Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik
We provide the first convergence analysis of local gradient descent for minimizing the average of smooth and convex but otherwise arbitrary functions.
no code implementations • 10 Sep 2019 • Ahmed Khaled, Konstantin Mishchenko, Peter Richtárik
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous.