TPU Inference for vLLM

library 0.13.3 ·python

✓ verified May 26, 2026

tpu-inference is a hardware plugin for vLLM, designed to enable efficient inference of large language models (LLMs) on Google Cloud TPUs. It unifies JAX and PyTorch under a single lowering path, allowing PyTorch model definitions to run performantly on TPUs without additional code changes, while also extending native support to JAX. The library aims to push TPU hardware performance limits and retain vLLM's standardized user experience. It is actively maintained by the vLLM project and Google Cloud, with releases tied to vLLM development.

Traffic · last 30 days ↑50% vs prev 7d · indexed Fri Apr 17 · updated Mon Jun 01

total hits 19

actors 9 distinct systems

last hit 19h ago ByteDance

GPTBot

MetaBot

Script

ByteDance

ChatGPT-User

Search engines

top countries 🇺🇸 United States · 🇨🇦 Canada · 🇫🇷 France · 🇵🇱 Poland · 🇸🇬 Singapore

Resources

packagepypi.org/project/tpu-inference/ ↗

API endpoints

full doc /v1/registry/tpu-inference

install /v1/registry/tpu-inference/install

compatibility /v1/registry/tpu-inference/compatibility