no code implementations • 1 May 2024 • Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, WeiPeng Chen, Bin Cui
Consequently, the GPU spends most of its time on memory transfer instead of computation.
1 code implementation • 5 Jul 2023 • Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models.
no code implementations • 8 Apr 2023 • Xiaonan Nie, Xupeng Miao, Zilong Wang, Zichao Yang, Jilong Xue, Lingxiao Ma, Gang Cao, Bin Cui
We first present an empirical analysis on the problems and opportunities of training MoE models, which motivates us to overcome the routing imbalance and fluctuation problems by a dynamic expert management and device placement mechanism.
no code implementations • 6 Mar 2023 • Xiaonan Nie, Yi Liu, Fangcheng Fu, Jinbao Xue, Dian Jiao, Xupeng Miao, Yangyu Tao, Bin Cui
Recent years have witnessed the unprecedented achievements of large-scale pre-trained models, especially the Transformer models.
2 code implementations • 25 Nov 2022 • Xupeng Miao, Yujie Wang, Youhe Jiang, Chunan Shi, Xiaonan Nie, Hailin Zhang, Bin Cui
Transformer models have achieved state-of-the-art performance on various domains of applications and gradually becomes the foundations of the advanced large deep learning (DL) models.
2 code implementations • 29 Dec 2021 • Xiaonan Nie, Xupeng Miao, Shijie Cao, Lingxiao Ma, Qibin Liu, Jilong Xue, Youshan Miao, Yi Liu, Zhi Yang, Bin Cui
Then it diversifies the experts and continues to train the MoE with a novel Dense-to-Sparse gate (DTS-Gate).
3 code implementations • 14 Dec 2021 • Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao, Bin Cui
Embedding models have been an effective learning paradigm for high-dimensional data.