TPU Inference for vLLM
JSON →tpu-inference is a hardware plugin for vLLM, designed to enable efficient inference of large language models (LLMs) on Google Cloud TPUs. It unifies JAX and PyTorch under a single lowering path, allowing PyTorch model definitions to run performantly on TPUs without additional code changes, while also extending native support to JAX. The library aims to push TPU hardware performance limits and retain vLLM's standardized user experience. It is actively maintained by the vLLM project and Google Cloud, with releases tied to vLLM development.
Traffic · last 30 days ↑50% vs prev 7d
total hits 19
actors 9 distinct systems
last hit 19h ago ByteDance
top countries 🇺🇸 United States · 🇨🇦 Canada · 🇫🇷 France · 🇵🇱 Poland · 🇸🇬 Singapore
Resources
API endpoints
full doc /v1/registry/tpu-inference
compatibility /v1/registry/tpu-inference/compatibility