no code implementations • 17 Jan 2022 • Hao Wang, Yuxuan Qin, ChonLam Lao, Yanfang Le, Wenfei Wu, Kai Chen
However, switch memory is scarce compared to the volume of gradients transmitted in distributed training.
Scheduling