Distributed Optimization
77 papers with code • 0 benchmarks • 0 datasets
The goal of Distributed Optimization is to optimize a certain objective defined over millions of billions of data that is distributed over many machines by utilizing the computational power of these machines.
Source: Analysis of Distributed StochasticDual Coordinate Ascent
Benchmarks
These leaderboards are used to track progress in Distributed Optimization
Libraries
Use these libraries to find Distributed Optimization models and implementationsLatest papers
FairSync: Ensuring Amortized Group Exposure in Distributed Recommendation Retrieval
Specifically, FairSync resolves the issue by moving it to the dual space, where a central node aggregates historical fairness data into a vector and distributes it to all servers.
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers
Many machine learning applications require operating on a spatially distributed dataset.
Asynchronous Local-SGD Training for Language Modeling
Local stochastic gradient descent (Local-SGD), also referred to as federated averaging, is an approach to distributed optimization where each device performs more than one SGD update per communication.
Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates
Byzantine robustness is an essential feature of algorithms for certain distributed optimization problems, typically encountered in collaborative/federated learning.
Differentially Private Distributed Estimation and Learning
We show that the noise that minimizes the convergence time to the best estimates is the Laplace noise, with parameters corresponding to each agent's sensitivity to their signal and network characteristics.
Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized Language Model Finetuning Using Shared Randomness
Language model training in distributed settings is limited by the communication cost of gradient exchanges.
Error Feedback Shines when Features are Rare
To illustrate our main result, we show that in order to find a random vector $\hat{x}$ such that $\lVert {\nabla f(\hat{x})} \rVert^2 \leq \varepsilon$ in expectation, ${\color{green}\sf GD}$ with the ${\color{green}\sf Top1}$ sparsifier and ${\color{green}\sf EF}$ requires ${\cal O} \left(\left( L+{\color{blue}r} \sqrt{ \frac{{\color{red}c}}{n} \min \left( \frac{{\color{red}c}}{n} \max_i L_i^2, \frac{1}{n}\sum_{i=1}^n L_i^2 \right) }\right) \frac{1}{\varepsilon} \right)$ bits to be communicated by each worker to the server only, where $L$ is the smoothness constant of $f$, $L_i$ is the smoothness constant of $f_i$, ${\color{red}c}$ is the maximal number of clients owning any feature ($1\leq {\color{red}c} \leq n$), and ${\color{blue}r}$ is the maximal number of features owned by any client ($1\leq {\color{blue}r} \leq d$).
On the Convergence of Decentralized Federated Learning Under Imperfect Information Sharing
The first algorithm, Federated Noisy Decentralized Learning (FedNDL1), comes from the literature, where the noise is added to their parameters to simulate the scenario of the presence of noisy communication channels.
Byzantine-Robust Loopless Stochastic Variance-Reduced Gradient
Distributed optimization with open collaboration is a popular field since it provides an opportunity for small groups/companies/universities, and individuals to jointly solve huge-scale problems.
TAMUNA: Doubly Accelerated Federated Learning with Local Training, Compression, and Partial Participation
In federated learning, a large number of users collaborate to learn a global model.