Search Results for author: Ido Hakimi

Found 8 papers, 2 papers with code

q2d: Turning Questions into Dialogs to Teach Models How to Search

no code implementations • 27 Apr 2023 • Yonatan Bitton, Shlomi Cohen-Ganor, Ido Hakimi, Yoad Lewenberg, Roee Aharoni, Enav Weinreb

One of the exciting capabilities of recent language models for dialog is their ability to independently search for relevant information to ground a given dialog response.

Language Modelling Large Language Model +1

Paper
Add Code

Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism

1 code implementation • USENIX Annual Technical Conference 2021 • Saar Eliad, Ido Hakimi, Alon De Jager, Mark Silberstein, Assaf Schuster

Fine-tuning is an increasingly common technique that leverages transfer learning to dramatically expedite the training of huge, high-quality models.

Transfer Learning

Paper
Code

Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays

no code implementations • 23 Jun 2021 • Rotem Zamir Aviv, Ido Hakimi, Assaf Schuster, Kfir Y. Levy

We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory.

Paper
Add Code

Gap-Aware Mitigation of Gradient Staleness

no code implementations • ICLR 2020 • Saar Barkai, Ido Hakimi, Assaf Schuster

In this paper we define the Gap as a measure of gradient staleness and propose Gap-Aware (GA), a novel asynchronous-distributed method that penalizes stale gradients linearly to the Gap and performs well even when scaling to large numbers of workers.

Cloud Computing

Paper
Add Code

Gap Aware Mitigation of Gradient Staleness

no code implementations • 24 Sep 2019 • Saar Barkai, Ido Hakimi, Assaf Schuster

Cloud Computing

Paper
Add Code

Taming Momentum in a Distributed Asynchronous Environment

no code implementations • 26 Jul 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster

We propose DANA: a novel technique for asynchronous distributed SGD with momentum that mitigates gradient staleness by computing the gradient on an estimated future position of the model's parameters.

16k Distributed Computing

Paper
Add Code

DANA: Scalable Out-of-the-box Distributed ASGD Without Retuning

no code implementations • ICLR 2019 • Ido Hakimi, Saar Barkai, Moshe Gabel, Assaf Schuster

We propose DANA, a novel approach that scales out-of-the-box to large clusters using the same hyperparameters and learning schedule optimized for training on a single worker, while maintaining similar final accuracy without additional overhead.

Distributed Computing

Paper
Add Code

Faster Neural Network Training with Approximate Tensor Operations

1 code implementation • NeurIPS 2021 • Menachem Adelman, Kfir Y. Levy, Ido Hakimi, Mark Silberstein

We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i. e., matrix multiplications and convolutions.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.