no code implementations • 31 Jan 2024 • Zhitian Xie, Yinger Zhang, Chenyi Zhuang, Qitao Shi, Zhining Liu, Jinjie Gu, Guannan Zhang
However, the gate's routing mechanism also gives rise to narrow vision: the individual MoE's expert fails to use more samples in learning the allocated sub-task, which in turn limits the MoE to further improve its generalization ability.
1 code implementation • 20 Dec 2023 • Yao Zhao, Zhitian Xie, Chen Liang, Chenyi Zhuang, Jinjie Gu
Instead of generating a single token at a time, we propose a Trie-based retrieval and verification mechanism to be able to accept several tokens at a forward step.