vLLM

JSON →
library 0.19.0 ·python
verified May 20, 2026

vLLM is a high-throughput and memory-efficient inference and serving engine for large language models (LLMs). It utilizes various optimization techniques, such as PagedAttention, to significantly improve LLM serving performance. Currently at version 0.19.0, vLLM maintains a rapid release cadence with frequent updates and new feature additions.

total hits 19
actors 6 distinct systems
last hit 1d ago GPTBot
GPTBot
6
Script
4
ChatGPT-User
4
OAI-SearchBot
2
Search engines
1

top countries 🇺🇸 United States · 🇩🇪 Germany · 🇮🇳 India · 🇵🇱 Poland · 🇨🇦 Canada