no code implementations • ICML 2020 • Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtarik
Due to the high communication cost in distributed and federated learning problems, methods relying on sparsification or quantization of communicated messages are becoming increasingly popular.
no code implementations • 29 Oct 2023 • Sijin Chen, Zhize Li, Yuejie Chi
To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions.
1 code implementation • 26 Oct 2022 • Lingxiao Huang, Zhize Li, Jialin Sun, Haoyu Zhao
Vertical federated learning (VFL), where data features are stored in multiple parties distributively, is an important area in machine learning.
no code implementations • 22 Aug 2022 • Zhize Li, Jian Li
We provide a clean and tight analysis of ProxSVRG+, which shows that it outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, hence solves an open problem proposed in Reddi et al. (2016b).
1 code implementation • 20 Jun 2022 • Zhize Li, Haoyu Zhao, Boyue Li, Yuejie Chi
We then propose a unified framework SoteriaFL for private federated learning, which accommodates a general family of local gradient estimators including popular stochastic variance-reduced gradient methods and the state-of-the-art shifted compression scheme.
no code implementations • 2 Feb 2022 • Peter Richtárik, Igor Sokolov, Ilyas Fatkhullin, Elnur Gasanov, Zhize Li, Eduard Gorbunov
We propose and study a new class of gradient communication mechanisms for communication-efficient training -- three point compressors (3PC) -- as well as efficient distributed nonconvex optimization algorithms that can take advantage of them.
1 code implementation • 31 Jan 2022 • Haoyu Zhao, Boyue Li, Zhize Li, Peter Richtárik, Yuejie Chi
Communication efficiency has been widely recognized as the bottleneck for large-scale decentralized machine learning applications in multi-agent or federated environments.
no code implementations • 24 Dec 2021 • Haoyu Zhao, Konstantin Burlachenko, Zhize Li, Peter Richtárik
In the convex setting, COFIG converges within $O(\frac{(1+\omega)\sqrt{N}}{S\epsilon})$ communication rounds, which, to the best of our knowledge, is also the first convergence result for compression schemes that do not communicate with all the clients in each round.
no code implementations • 7 Oct 2021 • Ilyas Fatkhullin, Igor Sokolov, Eduard Gorbunov, Zhize Li, Peter Richtárik
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular mechanism for enforcing convergence of distributed gradient-based optimization methods enhanced with communication compression strategies based on the application of contractive compression operators.
1 code implementation • 4 Oct 2021 • Boyue Li, Zhize Li, Yuejie Chi
Emerging applications in multi-agent environments such as internet-of-things, networked sensing, autonomous systems and federated learning, call for decentralized algorithms for finite-sum optimizations that are resource-efficient in terms of both computation and communication.
no code implementations • 29 Sep 2021 • Zhize Li, Slavomir Hanzely, Peter Richtárik
Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.
no code implementations • 10 Aug 2021 • Haoyu Zhao, Zhize Li, Peter Richtárik
We propose a new federated learning algorithm, FedPAGE, able to further reduce the communication complexity by utilizing the recent optimal PAGE method (Li et al., 2021) instead of plain SGD in FedAvg.
no code implementations • NeurIPS 2021 • Zhize Li, Peter Richtárik
Due to the high communication cost in distributed and federated learning, methods relying on compressed communication are becoming increasingly popular.
no code implementations • 17 Jun 2021 • Zhize Li
In this note, we first recall the nonconvex problem setting and introduce the optimal PAGE algorithm (Li et al., ICML'21).
no code implementations • 21 Mar 2021 • Zhize Li
ii) For strongly convex finite-sum problems, we also show that ANITA can achieve the optimal convergence rate $O\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ matching the lower bound $\Omega\big((n+\sqrt{\frac{nL}{\mu}})\log\frac{1}{\epsilon}\big)$ provided by Lan and Zhou (2015).
no code implementations • 2 Mar 2021 • Zhize Li, Slavomír Hanzely, Peter Richtárik
Avoiding any full gradient computations (which are time-consuming steps) is important in many applications as the number of data samples $n$ usually is very large.
1 code implementation • 15 Feb 2021 • Eduard Gorbunov, Konstantin Burlachenko, Zhize Li, Peter Richtárik
Unlike virtually all competing distributed first-order methods, including DIANA, ours is based on a carefully designed biased gradient estimator, which is the key to its superior theoretical and practical performance.
no code implementations • 25 Aug 2020 • Zhize Li, Hongyan Bao, Xiangliang Zhang, Peter Richtárik
Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems.
no code implementations • 12 Jun 2020 • Zhize Li, Peter Richtárik
We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant.
no code implementations • 26 Feb 2020 • Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richtárik
Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular.
no code implementations • NeurIPS 2019 • Guanghui Lan, Zhize Li, Yi Zhou
Moreover, Varag is the first accelerated randomized incremental gradient method that benefits from the strong convexity of the data-fidelity term to achieve the optimal linear convergence.
no code implementations • 1 May 2019 • Rong Ge, Zhize Li, Wei-Yao Wang, Xiang Wang
Variance reduction techniques like SVRG provide simple and fast algorithms for optimizing a convex finite-sum objective.
no code implementations • NeurIPS 2019 • Zhize Li
We emphasize that SSRGD algorithm for finding second-order stationary points is as simple as for finding first-order stationary points just by adding a uniform perturbation sometimes, while all other algorithms for finding second-order stationary points with similar gradient complexity need to combine with a negative-curvature search subroutine (e. g., Neon2 [Allen-Zhu and Li, 2018]).
no code implementations • ICLR 2019 • Rong Ge, Rohith Kuditipudi, Zhize Li, Xiang Wang
We give a new algorithm for learning a two-layer neural network under a general class of input distributions.
no code implementations • 7 Sep 2018 • Zhize Li, Jian Li
Besides, if the hyperparameters (e. g., the Lipschitz smooth parameter $L$) are not available, we propose a guessing algorithm for guessing them dynamically and also prove a similar convergence rate.
no code implementations • 29 Mar 2018 • Zhize Li, Tianyi Zhang, Shuyu Cheng, Jun Zhu, Jian Li
In this paper, we apply the variance reduction tricks on Hamiltonian Monte Carlo and achieve better theoretical convergence results compared with the variance-reduced Langevin dynamics.
1 code implementation • 15 Feb 2018 • Yu Shi, Jian Li, Zhize Li
We show that PL Trees can accelerate convergence of GBDT and improve the accuracy.
no code implementations • NeurIPS 2018 • Zhize Li, Jian Li
In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., 2017] for the smooth nonconvex case.
2 code implementations • 26 Oct 2016 • Zhize Li, Jian Li, Hongwei Huo
The open problem asked to design in-place algorithms in $o(n\log n)$ time and ultimately, in $O(n)$ time for (read-only) integer alphabets with $|\Sigma| \leq n$.
Data Structures and Algorithms
no code implementations • NeurIPS 2015 • Wei Cao, Jian Li, Yufei Tao, Zhize Li
This paper discusses how to efficiently choose from $n$ unknowndistributions the $k$ ones whose means are the greatest by a certainmetric, up to a small relative error.