no code implementations • 20 Jan 2024 • Sami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan, Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Joel Wee
Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding.
no code implementations • 7 Oct 2022 • Sami Alabed, Dominik Grewe, Juliana Franco, Bart Chrzaszcz, Tom Natan, Tamara Norman, Norman A. Rink, Dimitrios Vytiniotis, Michael Schaarschmidt
Large neural network models are commonly trained through a combination of advanced parallelism strategies in a single program, multiple data (SPMD) paradigm.
1 code implementation • 3 May 2022 • Sean Parker, Sami Alabed, Eiko Yoneki
Our proposed approach RLFlow can learn to perform neural network subgraph transformations, without the need for expertly designed heuristics to achieve a high level of performance.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 16 Dec 2021 • Sami Alabed, Eiko Yoneki
Then it applies the expert-provided knowledge to the graph to further contextualize the system behavior.
no code implementations • 30 Mar 2021 • Sami Alabed, Eiko Yoneki
The model is then incorporated in a standard Bayesian Optimization loop to find parameters that maximize RocksDB's IO throughput.
no code implementations • 30 Sep 2019 • Sami Alabed
We also introduce a framework to simplify the modelling of computer systems problems as a reinforcement learning task.