SGLang
JSON →SGLang is a high-performance serving framework for large language models (LLMs) and vision-language models (VLMs), implemented as a domain-specific language embedded in Python. It optimizes LLM inference through advanced techniques like RadixAttention for KV cache reuse, continuous batching, speculative decoding, and various parallelization strategies. The library supports a broad range of models from Hugging Face and offers compatibility with OpenAI APIs. SGLang maintains an active development pace with frequent, often monthly or bi-monthly, releases and is currently at version 0.5.9.
Traffic · last 30 days ↓17% vs prev 7d
When AI assistants answer questions about this library, they read this page. · indexed since Fri Apr 03
top countries 🇺🇸 United States · 🇮🇳 India · 🇫🇷 France · 🇩🇪 Germany · 🇧🇷 Brazil