Flash Attention

JSON →
library 2.8.3 ·python
verified May 22, 2026

Flash Attention is a fast and memory-efficient exact attention mechanism for deep learning models, particularly Transformers. It reorders the attention computation to reduce the number of memory accesses, making it significantly faster and less memory-intensive than standard attention. The library is currently stable at version 2.8.3, with an active beta development for version 4.0.0 which introduces new features and architectural changes. Its release cadence is driven by research advancements and performance optimizations.

total hits 29
actors 6 distinct systems
last hit 17h ago human
ByteDance
8
ChatGPT-User
3
GPTBot
2
Script
2
Humans
10

top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇨🇳 China · 🇩🇪 Germany · 🇺🇦 Ukraine