vLLM TPU

JSON →
library 0.19.0 ·python
verified May 9, 2026

vLLM TPU is a variant of vLLM that runs on Google Cloud TPUs (v5e/v5p). It provides a high-throughput and memory-efficient inference and serving engine for large language models, leveraging TPU-specific optimizations like Pallas kernels for attention and quantization. The current version is 0.19.0, following the main vLLM release cadence (monthly).

total hits 13
actors 5 distinct systems
last hit 8d ago AhrefsBot
Amazonbot
3
MetaBot
3
Search engines
1
Humans
2

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany · 🇫🇷 France · HN