no code implementations • 27 Apr 2023 • Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee Aharoni, Enav Weinreb
One of the exciting capabilities of recent language models for dialog is their ability to independently search for relevant information to ground a given dialog response.
1 code implementation • USENIX Annual Technical Conference 2021 • Saar Eliad, Ido Hakimi, Alon De Jager, Mark Silberstein, Assaf Schuster
Fine-tuning is an increasingly common technique that leverages transfer learning to dramatically expedite the training of huge, high-quality models.
no code implementations • 23 Jun 2021 • Rotem Zamir Aviv, Ido Hakimi, Assaf Schuster, Kfir Y. Levy
We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory.
no code implementations • ICLR 2020 • Saar Barkai, Ido Hakimi, Assaf Schuster
In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.
no code implementations • 24 Sep 2019 • Saar Barkai, Ido Hakimi, Assaf Schuster
In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.
no code implementations • 26 Jul 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster
We propose DANA: a novel technique for asynchronous distributed SGD with momentum that mitigates gradient staleness by computing the gradient on an estimated future position of the model's parameters.
no code implementations • ICLR 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster
We propose DANA, a novel approach that scales out-of-the-box to large clusters using the same hyperparameters and learning schedule optimized for training on a single worker, while maintaining similar final accuracy without additional overhead.
1 code implementation • NeurIPS 2021 • Menachem Adelman, Kfir Y. Levy, Ido Hakimi, Mark Silberstein
We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i. e., matrix multiplications and convolutions.