SMG gRPC Servicer

raw JSON →
0.5.2 verified Mon Apr 27 auth: no python

SMG gRPC servicer implementations for LLM inference engines (vLLM, SGLang). Provides gRPC service stubs and helpers for the Shepherd Model Gateway ecosystem. Current version 0.5.2, requires Python >=3.10.

pip install smg-grpc-servicer
error ModuleNotFoundError: No module named 'smg_grpc_servicer'
cause Library not installed or installed under the hyphenated name.
fix
Run pip install smg-grpc-servicer. Then import as import smg_grpc_servicer.
error TypeError: NvidiaGpuModelProviderAsyncio.__init__() got an unexpected keyword argument 'disconnected_debounce_s'
cause Keyword argument was renamed in v0.5.0 from `disconnected_debounce_s` to `disconnected_debounce`.
fix
Use disconnected_debounce=5.0 instead of disconnected_debounce_s=5.0.
error AttributeError: 'NvidiaGpuModelProvider' object has no attribute 'run'
cause Using the sync provider which does not have an async run method.
fix
Switch to NvidiaGpuModelProviderAsyncio and use async with provider.run().
breaking The sync `NvidiaGpuModelProvider` is deprecated and will be removed in a future release. Use `NvidiaGpuModelProviderAsyncio`.
fix Replace `NvidiaGpuModelProvider` with `NvidiaGpuModelProviderAsyncio` and adjust code to use `async with` context manager.
gotcha The library name on PyPI is `smg-grpc-servicer` but imports use underscores: `smg_grpc_servicer`. Common mistake is to try `import smg_grpc_servicer` with hyphens.
fix Install as `pip install smg-grpc-servicer` and import as `import smg_grpc_servicer`.
gotcha The `inference_engine` parameter must match exactly the engine running on the gRPC workers. Supported values: 'sglang', 'vllm'. Case-sensitive.
fix Set `inference_engine='sglang'` or `inference_engine='vllm'` (lowercase).

Run a basic async gRPC provider for a single SGLang worker.

import asyncio
from smg_grpc_servicer import NvidiaGpuModelProviderAsyncio

async def main():
    provider = NvidiaGpuModelProviderAsyncio(
        workers=[{"url": "localhost:50051", "disconnected_debounce_s": 5.0}],
        request_interval=0.1,
        inference_engine="sglang",
        name="my-engine",
    )
    async with provider.run():
        await asyncio.sleep(10)

asyncio.run(main())