vLLM Flash Attention Wrapper
JSON →Forward-only flash-attention kernel optimized for vLLM inference. Version 2.6.2 is the latest, released as a lightweight wrapper around the Flash Attention CUDA kernel with a simplified forward-only API. Development is active alongside vLLM releases.
Traffic · last 30 days ↓75% vs prev 7d
total hits 15
actors 5 distinct systems
last hit 4d ago AhrefsBot
top countries 🇺🇸 United States · 🇨🇦 Canada · 🇸🇬 Singapore · 🇪🇸 Spain · 🇫🇷 France