Model Parallel Methods

GeneralDistributed Methods • 9 methods

This section contains a compilation of distributed model parallel methods for scaling deep learning to very large models. For each node we assign different layers to it. During forward propagation, we start in the node with the first layers, then move onto the next, and so on. Once forward propagation is done we calculate gradients for the last node, and update model parameters for that node. Then we backpropagate onto the penultimate node, update the parameters, and so on.

Image credit: Jordi Torres.

Subcategories