vLLM Flash Attention Wrapper

JSON →
library 2.6.2 ·python
verified May 1, 2026

Forward-only flash-attention kernel optimized for vLLM inference. Version 2.6.2 is the latest, released as a lightweight wrapper around the Flash Attention CUDA kernel with a simplified forward-only API. Development is active alongside vLLM releases.

total hits 15
actors 5 distinct systems
last hit 4d ago AhrefsBot
GPTBot
5
MetaBot
3
Search engines
1
Humans
2

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇸🇬 Singapore · 🇪🇸 Spain · 🇫🇷 France