no code implementations • 30 Sep 2022 • William Zou, Hans De Sterck, Jun Liu
One of the largest bottlenecks in distributed training is communicating gradients across different nodes.