SMG gRPC Servicer

0.5.2 verified Mon Apr 27 auth: no python

SMG gRPC servicer implementations for LLM inference engines (vLLM, SGLang). Provides gRPC service stubs and helpers for the Shepherd Model Gateway ecosystem. Current version 0.5.2, requires Python >=3.10.

pip install smg-grpc-servicer

Common errors

error ModuleNotFoundError: No module named 'smg_grpc_servicer' ↓

cause Library not installed or installed under the hyphenated name.

fix

Run pip install smg-grpc-servicer. Then import as import smg_grpc_servicer.

error TypeError: NvidiaGpuModelProviderAsyncio.__init__() got an unexpected keyword argument 'disconnected_debounce_s' ↓

cause Keyword argument was renamed in v0.5.0 from `disconnected_debounce_s` to `disconnected_debounce`.

fix

Use disconnected_debounce=5.0 instead of disconnected_debounce_s=5.0.

error AttributeError: 'NvidiaGpuModelProvider' object has no attribute 'run' ↓

cause Using the sync provider which does not have an async run method.

fix

Switch to NvidiaGpuModelProviderAsyncio and use async with provider.run().

Warnings

breaking The sync `NvidiaGpuModelProvider` is deprecated and will be removed in a future release. Use `NvidiaGpuModelProviderAsyncio`. ↓

fix Replace `NvidiaGpuModelProvider` with `NvidiaGpuModelProviderAsyncio` and adjust code to use `async with` context manager.

gotcha The library name on PyPI is `smg-grpc-servicer` but imports use underscores: `smg_grpc_servicer`. Common mistake is to try `import smg_grpc_servicer` with hyphens. ↓

fix Install as `pip install smg-grpc-servicer` and import as `import smg_grpc_servicer`.

gotcha The `inference_engine` parameter must match exactly the engine running on the gRPC workers. Supported values: 'sglang', 'vllm'. Case-sensitive. ↓

fix Set `inference_engine='sglang'` or `inference_engine='vllm'` (lowercase).

Imports

NvidiaGpuModelProviderAsyncio

wrong

from smg_grpc_servicer import NvidiaGpuModelProvider

correct

from smg_grpc_servicer import NvidiaGpuModelProviderAsyncio

The async variant is the correct default; the sync variant is deprecated.

Quickstart

Run a basic async gRPC provider for a single SGLang worker.

import asyncio
from smg_grpc_servicer import NvidiaGpuModelProviderAsyncio

async def main():
    provider = NvidiaGpuModelProviderAsyncio(
        workers=[{"url": "localhost:50051", "disconnected_debounce_s": 5.0}],
        request_interval=0.1,
        inference_engine="sglang",
        name="my-engine",
    )
    async with provider.run():
        await asyncio.sleep(10)

asyncio.run(main())