Understanding Activation Memory Dynamics in Pipeline Parallelism Variants

This is a pointer blog post to an interactive simulation that visualizes the memory dynamics of GPipe (naive pipeline parallelism schedule) vs PipeDream (1F1B schedule).

Launch Interactive Simulation

Conceptual Analysis

We compare two major pipeline parallelism strategies:

GPipe (Standard): This approach uses a “flush-based” schedule where all forward passes for a microbatch must complete before any backward passes begin. As the simulation demonstrates, this causes activation memory to accumulate linearly with the number of microbatches, creating high peak memory pressure.
- Reference: Introducing GPipe, an Open Source Library for Efficiently Training Large-scale Neural Network Models
PipeDream (1F1B): This approach uses the “One-Forward-One-Backward” schedule. Once the pipeline warms up, workers alternate between processing a forward pass (storing a new microbatch of activations) and a backward pass (releasing an old microbatch of activations). The simulation highlights how this keeps memory usage stable and capped by the pipeline depth rather than the minibatch size per weight update.
- Reference: SOSP ‘19: PipeDream: Generalized Pipeline Parallelism for DNN Training (See their Figure 3 and Figure 4)

This tool is designed to be an interactive reference to better understand the scheduling concepts and memory implications discussed in the PipeDream paper.

Conceptual Analysis

Enjoy Reading This Article?