{"id":6880,"library":"sglang-router","title":"SGLang Router","description":"SGLang Router (PyPI: `sglang-router`, current version 0.3.2) is a high-performance, Rust-based load balancer designed for SGLang instances, facilitating data parallelism and advanced request routing. It supports multiple load balancing algorithms, including cache-aware, power of two, random, and round robin, and is specialized for prefill-decode disaggregated serving architectures. The project has been evolving into the 'SGLang Model Gateway', aiming to become a full OpenAI API server with features like native tool calling and session management. It maintains an active development pace with frequent updates and bug fixes.","status":"active","version":"0.3.2","language":"en","source_language":"en","source_url":"https://github.com/sgl-project/sglang/tree/main/sgl-router","tags":["LLM","router","load balancing","inference","SGLang","Rust","AI","model gateway"],"install":[{"cmd":"pip install sglang-router","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Required for building from source or using the Rust binary directly; not a Python runtime dependency for wheel installs.","package":"Rust and Cargo","optional":true},{"reason":"The backend inference engine that sglang-router is designed to route requests to. Not a direct Python dependency, but a functional prerequisite.","package":"SGLang","optional":false}],"imports":[{"symbol":"Router","correct":"from sglang_router import Router"}],"quickstart":{"code":"import os\nfrom sglang_router import Router\n\n# NOTE: This quickstart assumes SGLang worker instances are running\n# at the specified URLs (e.g., http://localhost:8000).\n# Replace with actual worker URLs if available.\nworker_urls = [\n    os.environ.get('SGLANG_WORKER_URL_1', 'http://localhost:8000'),\n    os.environ.get('SGLANG_WORKER_URL_2', 'http://localhost:8001')\n]\n\ntry:\n    # Initialize the SGLang Router\n    # By default, it runs in regular HTTP routing mode.\n    router = Router(worker_urls=worker_urls)\n    print(f\"SGLang Router initialized with workers: {worker_urls}\")\n\n    # In a real application, you would typically start the router\n    # (e.g., in a separate thread or process) and then send requests to it.\n    # For demonstration, we just show initialization.\n\n    # Example of running the router process (requires a running event loop or main function)\n    # This part is conceptual as `Router` doesn't expose a simple `run()` method directly\n    # in this programmatic interface; it's often launched via `python -m`.\n    print(\"Router instance created. To run, typically use 'python -m sglang_router.launch_router'\\n\"+\n          \"or integrate into an ASGI app. Refer to SGLang documentation for full deployment.\")\n\nexcept Exception as e:\n    print(f\"Error initializing SGLang Router: {e}\")\n    print(\"Ensure SGLang worker instances are running and accessible at the provided URLs.\")\n","lang":"python","description":"This quickstart demonstrates how to programmatically initialize the SGLang Router by providing a list of SGLang worker URLs. The router acts as a load balancer for these workers. For actual deployment and to handle incoming requests, the router is often launched as a separate process using the `python -m sglang_router.launch_router` command, or integrated into an ASGI application. Ensure that SGLang worker instances are running and accessible at the specified URLs for the router to function correctly."},"warnings":[{"fix":"Update Prometheus dashboards and alerting rules to reflect the new 6-layer metrics architecture (protocol, router, worker, streaming, circuit breaker, policy) and unified error codes.","message":"The metrics architecture has been redesigned, significantly changing metric names and structures. Users relying on Prometheus dashboards or alerting rules will need to update them when upgrading to newer versions of SGLang (which sglang-router interacts with).","severity":"breaking","affected_versions":"SGLang 0.5.x and later (applies to sglang-router's interaction with SGLang)"},{"fix":"Update any custom worker management or monitoring logic to use UUIDs for worker identification.","message":"Worker resource management now uses UUIDs instead of network endpoints for identification. This is a breaking change for systems that directly manage or monitor workers based on their network addresses.","severity":"breaking","affected_versions":"SGLang 0.5.x and later (applies to sglang-router's interaction with SGLang)"},{"fix":"For performance-critical scenarios, always build and install the wheel package (`python -m build && pip install --force-reinstall dist/*.whl`) rather than using an editable install.","message":"Installing `sglang-router` in editable mode (`pip install -e .`) can lead to performance degradation. This is generally suitable for development but not recommended for performance testing or production environments.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Increase system-level file descriptor limits (ulimit) for the process running the SGLang Router and workers. Implement proper connection management, capacity limits, and graceful degradation strategies.","message":"Under high concurrent load (~32,768 connections), SGLang, including its router, may experience performance degradation and request failures due to underlying file descriptor limits. This is often a system-level limitation.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Review SGLang migration guides and release notes when upgrading SGLang backend versions to ensure `sglang-router` configurations remain compatible.","message":"Migrating from SGLang backend versions 0.3.x to 0.5.x (which sglang-router 0.3.2 will likely interact with) may require configuration updates due to backward-incompatible changes in the SGLang core.","severity":"breaking","affected_versions":"SGLang backend versions 0.3.x to 0.5.x"},{"fix":"Stay informed about the official SGLang documentation and GitHub releases for updates on the Model Gateway's features and any migration paths.","message":"The project is actively evolving from a simple load balancer to a more comprehensive 'SGLang Model Gateway', which aims to provide a full OpenAI API server experience with advanced features like native tool calling, session management, and direct gRPC communication. This ongoing architectural shift may introduce new paradigms and potentially breaking changes in future major versions for users interacting with these advanced features.","severity":"gotcha","affected_versions":"Future major versions (post 0.3.x)"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z","problems":[]}