1 code implementation • 15 Apr 2024 • Nicolas Wagner, Dongyang Fan, Martin Jaggi
We explore on-device self-supervised collaborative fine-tuning of large language models with limited local data availability.
no code implementations • 20 Feb 2024 • Dongyang Fan, Bettina Messmer, Martin Jaggi
In this study, we systematically evaluate the impact of common design choices in Mixture of Experts (MoEs) on validation performance, uncovering distinct influences at token and sequence levels.
1 code implementation • 26 May 2023 • Atli Kosson, Dongyang Fan, Martin Jaggi
Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks.