no code implementations • 20 Jan 2024 • Sami Alabed, Daniel Belov, Bart Chrzaszcz, Juliana Franco, Dominik Grewe, Dougal Maclaurin, James Molloy, Tom Natan, Tamara Norman, Xiaoyue Pan, Adam Paszke, Norman A. Rink, Michael Schaarschmidt, Timur Sitdikov, Agnieszka Swietlik, Dimitrios Vytiniotis, Joel Wee
Training of modern large neural networks (NN) requires a combination of parallelization strategies encompassing data, model, or optimizer sharding.
no code implementations • 7 Oct 2022 • Sami Alabed, Dominik Grewe, Juliana Franco, Bart Chrzaszcz, Tom Natan, Tamara Norman, Norman A. Rink, Dimitrios Vytiniotis, Michael Schaarschmidt
Large neural network models are commonly trained through a combination of advanced parallelism strategies in a single program, multiple data (SPMD) paradigm.
no code implementations • 26 Sep 2022 • Michael Schaarschmidt, Morgane Riviere, Alex M. Ganose, James S. Spencer, Alexander L. Gaunt, James Kirkpatrick, Simon Axelrod, Peter W. Battaglia, Jonathan Godwin
We present evidence that learned density functional theory (``DFT'') force fields are ready for ground state catalyst discovery.
1 code implementation • 31 May 2022 • Sheheryar Zaidi, Michael Schaarschmidt, James Martens, Hyunjik Kim, Yee Whye Teh, Alvaro Sanchez-Gonzalez, Peter Battaglia, Razvan Pascanu, Jonathan Godwin
Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks.
no code implementations • 6 Dec 2021 • Michael Schaarschmidt, Dominik Grewe, Dimitrios Vytiniotis, Adam Paszke, Georg Stefan Schmid, Tamara Norman, James Molloy, Jonathan Godwin, Norman Alexander Rink, Vinod Nair, Dan Belov
The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism.
no code implementations • ICLR 2022 • Jonathan Godwin, Michael Schaarschmidt, Alexander L Gaunt, Alvaro Sanchez-Gonzalez, Yulia Rubanova, Petar Veličković, James Kirkpatrick, Peter Battaglia
We introduce “Noisy Nodes”, a very simple technique for improved training of GNNs, in which we corrupt the input graph with noise, and add a noise correcting node-level loss.
Initial Structure to Relaxed Energy (IS2RE), Direct Molecular Property Prediction +1
1 code implementation • 15 Jun 2021 • Jonathan Godwin, Michael Schaarschmidt, Alexander Gaunt, Alvaro Sanchez-Gonzalez, Yulia Rubanova, Petar Veličković, James Kirkpatrick, Peter Battaglia
From this observation we derive "Noisy Nodes", a simple technique in which we corrupt the input graph with noise, and add a noise correcting node-level loss.
Ranked #4 on Initial Structure to Relaxed Energy (IS2RE) on OC20
no code implementations • 16 Sep 2019 • Jeremy Welborn, Michael Schaarschmidt, Eiko Yoneki
Configuration spaces for computer systems can be challenging for traditional and automatic tuning strategies.
no code implementations • 15 Sep 2019 • Michael Schaarschmidt, Kai Fricke, Eiko Yoneki
Reinforcement learning frameworks have introduced abstractions to implement and execute algorithms at scale.
1 code implementation • 21 Oct 2018 • Michael Schaarschmidt, Sven Mika, Kai Fricke, Eiko Yoneki
Reinforcement learning (RL) tasks are challenging to implement, execute and test due to algorithmic instability, hyper-parameter sensitivity, and heterogeneous distributed communication patterns.
4 code implementations • 23 Aug 2018 • Michael Schaarschmidt, Alexander Kuhnle, Ben Ellis, Kai Fricke, Felix Gessert, Eiko Yoneki
In this work, we introduce LIFT, an end-to-end software stack for applying deep reinforcement learning to data management tasks.
no code implementations • 1 Dec 2016 • Valentin Dalibard, Michael Schaarschmidt, Eiko Yoneki
We present an optimizer which uses Bayesian optimization to tune the system parameters of distributed stochastic gradient descent (SGD).
no code implementations • 31 Oct 2016 • Michael Schaarschmidt, Felix Gessert, Valentin Dalibard, Eiko Yoneki
This paper investigates the use of deep reinforcement learning for runtime parameters of cloud databases under latency constraints.