Feb 14, 2026 Understanding Activation Memory Dynamics in Pipeline Parallelism Variants Feb 07, 2026 How Thread Block Swizzling boosts L2 Cache Hit Rate in Matrix Multiplication Jan 30, 2026 Implementing Flash Attention: Backward Pass in Triton