Model Parallel Methods

Edit

General • Distributed Methods • 9 methods

This section contains a compilation of distributed model parallel methods for scaling deep learning to very large models. For each node we assign different layers to it. During forward propagation, we start in the node with the first layers, then move onto the next, and so on. Once forward propagation is done we calculate gradients for the last node, and update model parameters for that node. Then we backpropagate onto the penultimate node, update the parameters, and so on.

Image credit: Jordi Torres.

Subcategories

1 Asynchronous Pipeline Parallel

2 Intra-Layer Parallel

3 Synchronous Pipeline Parallel

Methods

Add a Method

Method	Year	Papers
Chimera Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines	2021	9
GPipe GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism	2018	7
GShard GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding	2020	6
Tofu Supporting Very Large Models using Automatic Dataflow Graph Partitioning	2018	6
PipeDream	2019	4
PipeDream-2BW Memory-Efficient Pipeline-Parallel DNN Training	2020	3
Mesh-TensorFlow Mesh-TensorFlow: Deep Learning for Supercomputers	2018	2
Pipelined Backpropagation Pipelined Backpropagation at Scale: Training Large Models without Batches	2020	1
PipeMare PipeMare: Asynchronous Pipeline Parallel DNN Training	2019	1

Model Parallel Methods Edit

Methods Add a Method

Model Parallel Methods

Edit

Methods

Add a Method