Block Sparse Flash Attention
Published in arXiv, 2025
We present Block-Sparse FlashAttention (BSFA), a drop-in replacement that accelerates long-context inference while preserving model quality by addressing the quadratic complexity bottleneck.
Recommended citation: Daniel Ohayon, Itay Lamprecht, Itay Hubara, Israel Cohen, Daniel Soudry, Noam Elata. (2025). "Block Sparse Flash Attention." arXiv preprint arXiv:2512.07011. https://arxiv.org/pdf/2512.07011
