Search Results for author: Peizhuang Cong

Found 1 papers, 0 papers with code

Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

no code implementations25 Apr 2024 Peizhuang Cong, Aomufei Yuan, Shimao Chen, Yuxuan Tian, Bowen Ye, Tong Yang

To this end, we traced and analyzed loads of each expert in the training iterations for several large language models in this work, and defined the transient state with "obvious load fluctuation" and the stable state with "temporal locality".

Cannot find the paper you are looking for? You can Submit a new open access paper.