no code implementations • 12 Jun 2020 • Subhadeep Bhattacharya, Weikuan Yu, Fahim Tahmid Chowdhury
Large neural network models present a hefty communication challenge to distributed Stochastic Gradient Descent (SGD), with a communication complexity of O(n) per worker for a model of n parameters.