Understanding Activation Memory Dynamics in Pipeline Parallelism Variants
This is a pointer blog post to an interactive simulation that visualizes the memory dynamics of GPipe (naive pipeline parallelism schedule) vs PipeDream (1F1B schedule).
Conceptual Analysis
We compare two major pipeline parallelism strategies:
- GPipe (Standard): This approach uses a “flush-based” schedule where all forward passes for a microbatch must complete before any backward passes begin. As the simulation demonstrates, this causes activation memory to accumulate linearly with the number of microbatches, creating high peak memory pressure.
- PipeDream (1F1B): This approach uses the “One-Forward-One-Backward” schedule. Once the pipeline warms up, workers alternate between processing a forward pass (storing a new microbatch of activations) and a backward pass (releasing an old microbatch of activations). The simulation highlights how this keeps memory usage stable and capped by the pipeline depth rather than the minibatch size per weight update.
- Reference: SOSP ‘19: PipeDream: Generalized Pipeline Parallelism for DNN Training (See their Figure 3 and Figure 4)
This tool is designed to be an interactive reference to better understand the scheduling concepts and memory implications discussed in the PipeDream paper.
Enjoy Reading This Article?
Here are some more articles you might like to read next: