FlashInfer: Kernel Library for LLM Serving
JSON →FlashInfer is a high-performance kernel library for optimizing Large Language Model (LLM) inference on NVIDIA GPUs. It provides efficient CUDA kernels for operations like paged attention, prefill, and decode. Currently at version 0.6.7.post3, the library is under active development with frequent patch releases and nightly builds, indicating rapid evolution and potential API changes.
Traffic · last 30 days ↓14% vs prev 7d
total hits 30
actors 7 distinct systems
last hit 4d ago Script
top countries 🇺🇸 United States · 🇨🇳 China · VN · 🇸🇬 Singapore · 🇮🇳 India
Resources
homepageflashinfer.ai ↗
API endpoints
full doc /v1/registry/flashinfer-python
compatibility /v1/registry/flashinfer-python/compatibility