no code implementations • 11 Sep 2023 • Johannes von Oswald, Eyvind Niklasson, Maximilian Schlegel, Seijin Kobayashi, Nicolas Zucchet, Nino Scherrer, Nolan Miller, Mark Sandler, Blaise Agüera y Arcas, Max Vladymyrov, Razvan Pascanu, João Sacramento
Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood.
no code implementations • 5 Jan 2023 • Mark Sandler, Andrey Zhmoginov, Max Vladymyrov, Nolan Miller
In particular, for Exponential Moving Average (EMA) and Stochastic Weight Averaging we show that our proposed model matches the observed training trajectories on ImageNet.
no code implementations • CVPR 2023 • Andrey Zhmoginov, Mark Sandler, Nolan Miller, Gus Kristiansen, Max Vladymyrov
We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.
1 code implementation • 10 Apr 2021 • Mark Sandler, Max Vladymyrov, Andrey Zhmoginov, Nolan Miller, Andrew Jackson, Tom Madams, Blaise Aguera y Arcas
We show that classical gradient-based backpropagation in neural networks can be seen as a special case of a two-state network where one state is used for activations and another for gradients, with update rules derived from the chain rule.
1 code implementation • 24 Nov 2020 • Nolan Miller, Logan C Carpenter, Evan Berkowitz, Chia Cheng Chang, Ben Hörz, Dean Howarth, Henry Monge-Camacho, Enrico Rinaldi, David A. Brantley, Christopher Körber, Chris Bouchard, M. A. Clark, Arjun Singh Gambhir, Christopher J. Monahan, Amy Nicholson, Pavlos Vranas, André Walker-Loud
We report on a sub-percent scale determination using the omega baryon mass and gradient-flow methods.
High Energy Physics - Lattice High Energy Physics - Phenomenology Nuclear Theory
1 code implementation • 10 May 2020 • Nolan Miller, Henry Monge-Camacho, Chia Cheng Chang, Ben Hörz, Enrico Rinaldi, Dean Howarth, Evan Berkowitz, David A. Brantley, Arjun Singh Gambhir, Christopher Körber, Christopher J. Monahan, M. A. Clark, Bálint Joó, Thorsten Kurth, Amy Nicholson, Kostas Orginos, Pavlos Vranas, André Walker-Loud
We report the results of a lattice quantum chromodynamics calculation of $F_K/F_\pi$ using M\"{o}bius domain-wall fermions computed on gradient-flowed $N_f=2+1+1$ highly-improved staggered quark (HISQ) ensembles.
High Energy Physics - Lattice High Energy Physics - Experiment High Energy Physics - Phenomenology Nuclear Theory