Search Results for author: Ali Ramezani-Kebrya

Found 12 papers, 3 papers with code

Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

1 code implementation17 Aug 2023 Ali Ramezani-Kebrya, Kimon Antonakopoulos, Igor Krawczuk, Justin Deschenaux, Volkan Cevher

We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors.

Federated Learning under Covariate Shifts with Generalization Guarantees

no code implementations8 Jun 2023 Ali Ramezani-Kebrya, Fanghui Liu, Thomas Pethick, Grigorios Chrysos, Volkan Cevher

This paper addresses intra-client and inter-client covariate shifts in federated learning (FL) with a focus on the overall generalization performance.

Federated Learning

MixTailor: Mixed Gradient Aggregation for Robust Learning Against Tailored Attacks

no code implementations16 Jul 2022 Ali Ramezani-Kebrya, Iman Tabrizian, Fartash Faghri, Petar Popovski

We introduce MixTailor, a scheme based on randomization of the aggregation strategies that makes it impossible for the attacker to be fully informed.

Subquadratic Overparameterization for Shallow Neural Networks

no code implementations NeurIPS 2021 ChaeHwan Song, Ali Ramezani-Kebrya, Thomas Pethick, Armin Eftekhari, Volkan Cevher

Overparameterization refers to the important phenomenon where the width of a neural network is chosen such that learning algorithms can provably attain zero loss in nonconvex training.

Linear Convergence of SGD on Overparametrized Shallow Neural Networks

no code implementations29 Sep 2021 Paul Rolland, Ali Ramezani-Kebrya, ChaeHwan Song, Fabian Latorre, Volkan Cevher

Despite the non-convex landscape, first-order methods can be shown to reach global minima when training overparameterized neural networks, where the number of parameters far exceed the number of training data.

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

no code implementations28 Apr 2021 Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training.

Quantization

On the Generalization of Stochastic Gradient Descent with Momentum

no code implementations26 Feb 2021 Ali Ramezani-Kebrya, Ashish Khisti, Ben Liang

While momentum-based methods, in conjunction with stochastic gradient descent (SGD), are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods.

BIG-bench Machine Learning

Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

no code implementations25 Sep 2019 Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.

Quantization

NUQSGD: Improved Communication Efficiency for Data-parallel SGD via Nonuniform Quantization

1 code implementation16 Aug 2019 Ali Ramezani-Kebrya, Fartash Faghri, Daniel M. Roy

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed on clusters to perform model fitting in parallel.

Quantization

Stability of Stochastic Gradient Method with Momentum for Strongly Convex Loss Functions

no code implementations ICLR 2019 Ali Ramezani-Kebrya, Ashish Khisti, and Ben Liang

While momentum-based methods, in conjunction with the stochastic gradient descent, are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods.

On the Generalization of Stochastic Gradient Descent with Momentum

no code implementations12 Sep 2018 Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher, Ashish Khisti, Ben Liang

While momentum-based accelerated variants of stochastic gradient descent (SGD) are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.