no code implementations • 6 Nov 2023 • Yuan Gao, Rustem Islamov, Sebastian Stich
Error Compensation (EC) is an extremely popular mechanism to mitigate the aforementioned issues during the training of models enhanced by contractive compression operators.
no code implementations • 31 Oct 2023 • Rustem Islamov, Mher Safaryan, Dan Alistarh
As a by-product of our analysis, we also demonstrate convergence guarantees for gradient-type algorithms such as SGD with random reshuffling and shuffle-once mini-batch SGD.
no code implementations • 30 May 2023 • Sarit Khirirat, Eduard Gorbunov, Samuel Horváth, Rustem Islamov, Fakhri Karray, Peter Richtárik
Motivated by the increasing popularity and importance of large-scale training under differential privacy (DP) constraints, we study distributed gradient methods with gradient clipping, i. e., clipping applied to the gradients computed from local information at the nodes.
no code implementations • 29 May 2023 • Konstantin Mishchenko, Rustem Islamov, Eduard Gorbunov, Samuel Horváth
We present a partially personalized formulation of Federated Learning (FL) that strikes a balance between the flexibility of personalization and cooperativeness of global training.
no code implementations • 31 Oct 2022 • Maksim Makarenko, Elnur Gasanov, Rustem Islamov, Abdurakhmon Sadiev, Peter Richtarik
We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level.
no code implementations • 7 Jun 2022 • Rustem Islamov, Xun Qian, Slavomír Hanzely, Mher Safaryan, Peter Richtárik
Despite their high computation and communication costs, Newton-type methods remain an appealing option for distributed training due to their robustness against ill-conditioned convex problems.
no code implementations • 2 Nov 2021 • Xun Qian, Rustem Islamov, Mher Safaryan, Peter Richtárik
Recent advances in distributed optimization have shown that Newton-type methods with proper communication compression mechanisms can guarantee fast local rates and low communication cost compared to first order methods.
no code implementations • 5 Jun 2021 • Mher Safaryan, Rustem Islamov, Xun Qian, Peter Richtárik
In contrast to the aforementioned work, FedNL employs a different Hessian learning technique which i) enhances privacy as it does not rely on the training data to be revealed to the coordinating server, ii) makes it applicable beyond generalized linear models, and iii) provably works with general contractive compression operators for compressing the local Hessians, such as Top-$K$ or Rank-$R$, which are vastly superior in practice.
no code implementations • 14 Feb 2021 • Rustem Islamov, Xun Qian, Peter Richtárik
Finally, we develop a globalization strategy using cubic regularization which leads to our next method, CUBIC-NEWTON-LEARN, for which we prove global sublinear and linear convergence rates, and a fast superlinear rate.