Search Results for author: Bettina Messmer

Found 2 papers, 1 papers with code

Towards an empirical understanding of MoE design choices

no code implementations20 Feb 2024 Dongyang Fan, Bettina Messmer, Martin Jaggi

In this study, we systematically evaluate the impact of common design choices in Mixture of Experts (MoEs) on validation performance, uncovering distinct influences at token and sequence levels.

Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks

2 code implementations26 May 2023 Atli Kosson, Bettina Messmer, Martin Jaggi

This study investigates how weight decay affects the update behavior of individual neurons in deep neural networks through a combination of applied analysis and experimentation.

L2 Regularization

Cannot find the paper you are looking for? You can Submit a new open access paper.