Flash Linear Attention
JSON →Flash Linear Attention (FLA) is a Python library providing efficient, Triton-based implementations for state-of-the-art linear attention models and emerging sequence modeling architectures. It aims for high-performance training and inference across NVIDIA, AMD, and Intel GPUs. As of version 0.4.2, the library is actively maintained with frequent releases, offering optimized kernels, fused modules, and integration-ready layers for PyTorch and Hugging Face models.
Traffic · last 30 days ↑118% vs prev 7d
total hits 54
actors 12 distinct systems
last hit 5h ago ChatGPT-User
top countries 🇺🇸 United States · 🇩🇪 Germany · 🇸🇬 Singapore · 🇨🇦 Canada · 🇧🇷 Brazil
Resources
API endpoints
full doc /v1/registry/flash-linear-attention
compatibility /v1/registry/flash-linear-attention/compatibility