Feb 14, 2026 Understanding Activation Memory Dynamics in Pipeline Parallelism Variants Feb 07, 2026 How Thread Block Swizzling boosts L2 Cache Hit Rate in Matrix Multiplication