vLLM
JSON →vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs). It utilizes various optimization techniques, such as PagedAttention, to significantly improve LLM serving performance. Currently at version 0.19.0, vLLM maintains a rapid release cadence with frequent updates and new feature additions.
Traffic · last 30 days ↑75% vs prev 7d
total hits 19
actors 6 distinct systems
last hit 1d ago GPTBot
top countries 🇺🇸 United States · 🇩🇪 Germany · 🇮🇳 India · 🇵🇱 Poland · 🇨🇦 Canada
Resources
packagepypi.org/project/vllm/ ↗
API endpoints
full doc /v1/registry/vllm
install /v1/registry/vllm/install
compatibility /v1/registry/vllm/compatibility