TPU Inference for vLLM

JSON →
library 0.13.3 ·python
verified May 26, 2026

tpu-inference is a hardware plugin for vLLM, designed to enable efficient inference of large language models (LLMs) on Google Cloud TPUs. It unifies JAX and PyTorch under a single lowering path, allowing PyTorch model definitions to run performantly on TPUs without additional code changes, while also extending native support to JAX. The library aims to push TPU hardware performance limits and retain vLLM's standardized user experience. It is actively maintained by the vLLM project and Google Cloud, with releases tied to vLLM development.

total hits 19
actors 9 distinct systems
last hit 19h ago ByteDance
GPTBot
5
MetaBot
4
Script
1
ByteDance
1
ChatGPT-User
1
Search engines
2

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇫🇷 France · 🇵🇱 Poland · 🇸🇬 Singapore