FlashInfer: Kernel Library for LLM Serving

library 0.6.7.post3 ·python

✓ verified Jun 28, 2026

FlashInfer is a high-performance kernel library for optimizing Large Language Model (LLM) inference on NVIDIA GPUs. It provides efficient CUDA kernels for operations like paged attention, prefill, and decode. Currently at version 0.6.7.post3, the library is under active development with frequent patch releases and nightly builds, indicating rapid evolution and potential API changes.

Traffic · last 30 days ↑33% vs prev 7d · indexed Thu Apr 09 · updated Sat Jul 11

total hits 28

actors 6 distinct systems

last hit 1d ago human

ByteDance

Perplexity-User

GPTBot

OAI-SearchBot

MetaBot

Humans

top countries 🇺🇸 United States · 🇸🇬 Singapore · 🇨🇦 Canada · 🇨🇳 China · 🇮🇳 India

Resources

githubgithub.com/flashinfer-ai/flashinfer ↗

packagepypi.org/project/flashinfer-python/ ↗

homepageflashinfer.ai ↗

API endpoints

full doc /v1/registry/flashinfer-python

install /v1/registry/flashinfer-python/install

compatibility /v1/registry/flashinfer-python/compatibility