Block Sparse Flash Attention

Published in arXiv, 2025

Recommended citation: Daniel Ohayon, Itay Lamprecht, Itay Hubara, Israel Cohen, Daniel Soudry, Noam Elata. (2025). "Block Sparse Flash Attention." arXiv preprint arXiv:2512.07011. https://arxiv.org/pdf/2512.07011

Modern large language models increasingly require long contexts for reasoning and multi-document tasks, but attention’s quadratic complexity creates a severe computational bottleneck. We present Block-Sparse FlashAttention (BSFA), a drop-in replacement that accelerates long-context inference while preserving model quality. Unlike methods that predict importance before computing scores, BSFA computes attention only on important blocks.

Download paper here

Share on

Twitter Facebook LinkedIn