-
Implementing Flash Attention: Backward Pass in Triton
In this follow-up post to Nathan Chen's Triton Flash Attention Kernel Walkthrough: The Forward Pass, we dive into gradient computation for queries, keys, and values in the backward pass.
In this follow-up post to Nathan Chen's Triton Flash Attention Kernel Walkthrough: The Forward Pass, we dive into gradient computation for queries, keys, and values in the backward pass.