FlashInfer: Kernel Library for LLM Serving

JSON →
library 0.6.7.post3 ·python
verified May 20, 2026

FlashInfer is a high-performance kernel library for optimizing Large Language Model (LLM) inference on NVIDIA GPUs. It provides efficient CUDA kernels for operations like paged attention, prefill, and decode. Currently at version 0.6.7.post3, the library is under active development with frequent patch releases and nightly builds, indicating rapid evolution and potential API changes.

total hits 30
actors 7 distinct systems
last hit 4d ago Script
Script
3
ChatGPT-User
3
OAI-SearchBot
2
PerplexityBot
2
Search engines
3
Humans
13

top countries 🇺🇸 United States · 🇨🇳 China · VN · 🇸🇬 Singapore · 🇮🇳 India