Search Results for author: Dongyang Fan

Found 3 papers, 2 papers with code

Personalized Collaborative Fine-Tuning for On-Device Large Language Models

1 code implementation15 Apr 2024 Nicolas Wagner, Dongyang Fan, Martin Jaggi

We explore on-device self-supervised collaborative fine-tuning of large language models with limited local data availability.

Towards an empirical understanding of MoE design choices

no code implementations20 Feb 2024 Dongyang Fan, Bettina Messmer, Martin Jaggi

In this study, we systematically evaluate the impact of common design choices in Mixture of Experts (MoEs) on validation performance, uncovering distinct influences at token and sequence levels.

Ghost Noise for Regularizing Deep Neural Networks

1 code implementation26 May 2023 Atli Kosson, Dongyang Fan, Martin Jaggi

Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.