Search Results for author: Dongyang Fan

Found 3 papers, 2 papers with code

Personalized Collaborative Fine-Tuning for On-Device Large Language Models

1 code implementation • 15 Apr 2024 • Nicolas Wagner, Dongyang Fan, Martin Jaggi

We explore on-device self-supervised collaborative fine-tuning of large language models with limited local data availability.

Paper
Code

Towards an empirical understanding of MoE design choices

no code implementations • 20 Feb 2024 • Dongyang Fan, Bettina Messmer, Martin Jaggi

In this study, we systematically evaluate the impact of common design choices in Mixture of Experts (MoEs) on validation performance, uncovering distinct influences at token and sequence levels.

Paper
Add Code

Ghost Noise for Regularizing Deep Neural Networks

1 code implementation • 26 May 2023 • Atli Kosson, Dongyang Fan, Martin Jaggi

Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.