no code implementations • ICLR 2020 • Saar Barkai, Ido Hakimi, Assaf Schuster
In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.
no code implementations • 24 Sep 2019 • Saar Barkai, Ido Hakimi, Assaf Schuster
In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.
no code implementations • 26 Jul 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster
We propose DANA: a novel technique for asynchronous distributed SGD with momentum that mitigates gradient staleness by computing the gradient on an estimated future position of the model's parameters.
no code implementations • ICLR 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster
We propose DANA, a novel approach that scales out-of-the-box to large clusters using the same hyperparameters and learning schedule optimized for training on a single worker, while maintaining similar final accuracy without additional overhead.