SGLang Router
SGLang Router (PyPI: `sglang-router`, current version 0.3.2) is a high-performance, Rust-based load balancer designed for SGLang instances, facilitating data parallelism and advanced request routing. It supports multiple load balancing algorithms, including cache-aware, power of two, random, and round robin, and is specialized for prefill-decode disaggregated serving architectures. The project has been evolving into the 'SGLang Model Gateway', aiming to become a full OpenAI API server with features like native tool calling and session management. It maintains an active development pace with frequent updates and bug fixes.
Warnings
- breaking The metrics architecture has been redesigned, significantly changing metric names and structures. Users relying on Prometheus dashboards or alerting rules will need to update them when upgrading to newer versions of SGLang (which sglang-router interacts with).
- breaking Worker resource management now uses UUIDs instead of network endpoints for identification. This is a breaking change for systems that directly manage or monitor workers based on their network addresses.
- gotcha Installing `sglang-router` in editable mode (`pip install -e .`) can lead to performance degradation. This is generally suitable for development but not recommended for performance testing or production environments.
- gotcha Under high concurrent load (~32,768 connections), SGLang, including its router, may experience performance degradation and request failures due to underlying file descriptor limits. This is often a system-level limitation.
- breaking Migrating from SGLang backend versions 0.3.x to 0.5.x (which sglang-router 0.3.2 will likely interact with) may require configuration updates due to backward-incompatible changes in the SGLang core.
- gotcha The project is actively evolving from a simple load balancer to a more comprehensive 'SGLang Model Gateway', which aims to provide a full OpenAI API server experience with advanced features like native tool calling, session management, and direct gRPC communication. This ongoing architectural shift may introduce new paradigms and potentially breaking changes in future major versions for users interacting with these advanced features.
Install
-
pip install sglang-router
Imports
- Router
from sglang_router import Router
Quickstart
import os
from sglang_router import Router
# NOTE: This quickstart assumes SGLang worker instances are running
# at the specified URLs (e.g., http://localhost:8000).
# Replace with actual worker URLs if available.
worker_urls = [
os.environ.get('SGLANG_WORKER_URL_1', 'http://localhost:8000'),
os.environ.get('SGLANG_WORKER_URL_2', 'http://localhost:8001')
]
try:
# Initialize the SGLang Router
# By default, it runs in regular HTTP routing mode.
router = Router(worker_urls=worker_urls)
print(f"SGLang Router initialized with workers: {worker_urls}")
# In a real application, you would typically start the router
# (e.g., in a separate thread or process) and then send requests to it.
# For demonstration, we just show initialization.
# Example of running the router process (requires a running event loop or main function)
# This part is conceptual as `Router` doesn't expose a simple `run()` method directly
# in this programmatic interface; it's often launched via `python -m`.
print("Router instance created. To run, typically use 'python -m sglang_router.launch_router'\n"+
"or integrate into an ASGI app. Refer to SGLang documentation for full deployment.")
except Exception as e:
print(f"Error initializing SGLang Router: {e}")
print("Ensure SGLang worker instances are running and accessible at the provided URLs.")