no code implementations • 18 Aug 2023 • Xiaoge Deng, Li Shen, Shengwei Li, Tao Sun, Dongsheng Li, DaCheng Tao
Stochastic gradient descent (SGD) performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.
1 code implementation • 10 Jun 2022 • Zhiquan Lai, Shengwei Li, Xudong Tang, Keshi Ge, Weijie Liu, Yabo Duan, Linbo Qiao, Dongsheng Li
These features make it necessary to apply 3D parallelism, which integrates data parallelism, pipeline model parallelism and tensor model parallelism, to achieve high training efficiency.
no code implementations • 18 Oct 2021 • Shengwei Li, Zhiquan Lai, Dongsheng Li, Yiming Zhang, Xiangyu Ye, Yabo Duan
EmbRace introduces Sparsity-aware Hybrid Communication, which integrates AlltoAll and model parallelism into data-parallel training, so as to reduce the communication overhead of highly sparse parameters.