Flash Attention 4 (CUTE implementation)
JSON →Flash Attention 4 is the next-generation implementation of the Flash Attention algorithm using NVIDIA CUTE (CUDA Template Engine). It provides highly optimized fused attention kernels for modern GPUs, supporting head dimensions up to 256 and various data types including FP8. Version 4.0.0b12 is in beta, with frequent releases.
Traffic · last 30 days ↓50% vs prev 7d
total hits 20
actors 5 distinct systems
last hit 16h ago AhrefsBot
top countries 🇺🇸 United States · 🇸🇬 Singapore · 🇨🇦 Canada · 🇪🇸 Spain · 🇫🇷 France