Flash Attention
JSON →Flash Attention is a fast and memory-efficient exact attention mechanism for deep learning models, particularly Transformers. It reorders the attention computation to reduce the number of memory accesses, making it significantly faster and less memory-intensive than standard attention. The library is currently stable at version 2.8.3, with an active beta development for version 4.0.0 which introduces new features and architectural changes. Its release cadence is driven by research advancements and performance optimizations.
Traffic · last 30 days ↑117% vs prev 7d
total hits 29
actors 6 distinct systems
last hit 17h ago human
top countries 🇸🇬 Singapore · 🇺🇸 United States · 🇨🇳 China · 🇩🇪 Germany · 🇺🇦 Ukraine
Resources
API endpoints
full doc /v1/registry/flash-attn
install /v1/registry/flash-attn/install
compatibility /v1/registry/flash-attn/compatibility