no code implementations • 11 Oct 2023 • Jiamin Li, Qiang Su, Yitao Yang, Yimin Jiang, Cong Wang, Hong Xu
Existing MoE model adopts a fixed gating network where each token is computed by the same number of experts.