vLLM TPU
JSON →vLLM TPU is a variant of vLLM that runs on Google Cloud TPUs (v5e/v5p). It provides a high-throughput and memory-efficient inference and serving engine for large language models, leveraging TPU-specific optimizations like Pallas kernels for attention and quantization. The current version is 0.19.0, following the main vLLM release cadence (monthly).
Traffic · last 30 days ↓100% vs prev 7d
total hits 13
actors 5 distinct systems
last hit 8d ago AhrefsBot
top countries 🇺🇸 United States · 🇨🇦 Canada · 🇩🇪 Germany · 🇫🇷 France · HN
Resources
packagepypi.org/project/vllm-tpu/ ↗