Search Results for author: Ali Ramezani-Kebrya

Found 12 papers, 3 papers with code

Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

1 code implementation • 17 Aug 2023 • Ali Ramezani-Kebrya, Kimon Antonakopoulos, Igor Krawczuk, Justin Deschenaux, Volkan Cevher

We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors.

Paper
Code

Federated Learning under Covariate Shifts with Generalization Guarantees

no code implementations • 8 Jun 2023 • Ali Ramezani-Kebrya, Fanghui Liu, Thomas Pethick, Grigorios Chrysos, Volkan Cevher

This paper addresses intra-client and inter-client covariate shifts in federated learning (FL) with a focus on the overall generalization performance.

Federated Learning

Paper
Add Code

MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks

no code implementations • 16 Jul 2022 • Ali Ramezani-Kebrya, Iman Tabrizian, Fartash Faghri, Petar Popovski

We introduce MixTailor, a scheme based on randomization of the aggregation strategies that makes it impossible for the attacker to be fully informed.

Paper
Add Code

Subquadratic Overparameterization for Shallow Neural Networks

no code implementations • NeurIPS 2021 • ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick, Armin Eftekhari, Volkan Cevher

Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training.

Paper
Add Code

Linear Convergence of SGD on Overparametrized Shallow Neural Networks

no code implementations • 29 Sep 2021 • Paul Rolland, Ali Ramezani-Kebrya, ChaeHwan Song, Fabian Latorre, Volkan Cevher

Despite the non-convex landscape, first-order methods can be shown to reach global minima when training overparameterized neural networks, where the number of parameters far exceed the number of training data.

Paper
Add Code

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

no code implementations • 28 Apr 2021 • Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training.

Quantization

Paper
Add Code

On the Generalization of Stochastic Gradient Descent with Momentum

no code implementations • 26 Feb 2021 • Ali Ramezani-Kebrya, Ashish Khisti, Ben Liang

While momentum-based methods, in conjunction with stochastic gradient descent (SGD), are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods.

BIG-bench Machine Learning

Paper
Add Code

Adaptive Gradient Quantization for Data-Parallel SGD

1 code implementation • NeurIPS 2020 • Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel Roy, Ali Ramezani-Kebrya

Many communication-efficient variants of SGD use gradient quantization schemes.

Quantization

Paper
Code

Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

no code implementations • 25 Sep 2019 • Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.

Quantization

Paper
Add Code

NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization

1 code implementation • 16 Aug 2019 • Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy

Quantization

Paper
Code

Stability of Stochastic Gradient Method with Momentum for Strongly Convex Loss Functions

no code implementations • ICLR 2019 • Ali Ramezani-Kebrya, Ashish Khisti, and Ben Liang

While momentum-based methods, in conjunction with the stochastic gradient descent, are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods.

Paper
Add Code

On the Generalization of Stochastic Gradient Descent with Momentum

no code implementations • 12 Sep 2018 • Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods.

BIG-bench Machine Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.