no code implementations • 25 Apr 2024 • Peizhuang Cong, Aomufei Yuan, Shimao Chen, Yuxuan Tian, Bowen Ye, Tong Yang
To this end, we traced and analyzed loads of each expert in the training iterations for several large language models in this work, and defined the transient state with "obvious load fluctuation" and the stable state with "temporal locality".
no code implementations • 26 Mar 2024 • Bowen Ye, Jianing Zhao, ShaoYuan Li, Xiang Yin
Simultaneously, we aim to maintain a pre-determined order in the values of the objective function for each agent, which we refer to as the ordering constraints.