SMG gRPC Servicer
raw JSON → 0.5.2 verified Mon Apr 27 auth: no python
SMG gRPC servicer implementations for LLM inference engines (vLLM, SGLang). Provides gRPC service stubs and helpers for the Shepherd Model Gateway ecosystem. Current version 0.5.2, requires Python >=3.10.
pip install smg-grpc-servicer Common errors
error ModuleNotFoundError: No module named 'smg_grpc_servicer' ↓
cause Library not installed or installed under the hyphenated name.
fix
Run
pip install smg-grpc-servicer. Then import as import smg_grpc_servicer. error TypeError: NvidiaGpuModelProviderAsyncio.__init__() got an unexpected keyword argument 'disconnected_debounce_s' ↓
cause Keyword argument was renamed in v0.5.0 from `disconnected_debounce_s` to `disconnected_debounce`.
fix
Use
disconnected_debounce=5.0 instead of disconnected_debounce_s=5.0. error AttributeError: 'NvidiaGpuModelProvider' object has no attribute 'run' ↓
cause Using the sync provider which does not have an async run method.
fix
Switch to
NvidiaGpuModelProviderAsyncio and use async with provider.run(). Warnings
breaking The sync `NvidiaGpuModelProvider` is deprecated and will be removed in a future release. Use `NvidiaGpuModelProviderAsyncio`. ↓
fix Replace `NvidiaGpuModelProvider` with `NvidiaGpuModelProviderAsyncio` and adjust code to use `async with` context manager.
gotcha The library name on PyPI is `smg-grpc-servicer` but imports use underscores: `smg_grpc_servicer`. Common mistake is to try `import smg_grpc_servicer` with hyphens. ↓
fix Install as `pip install smg-grpc-servicer` and import as `import smg_grpc_servicer`.
gotcha The `inference_engine` parameter must match exactly the engine running on the gRPC workers. Supported values: 'sglang', 'vllm'. Case-sensitive. ↓
fix Set `inference_engine='sglang'` or `inference_engine='vllm'` (lowercase).
Imports
- NvidiaGpuModelProviderAsyncio wrong
from smg_grpc_servicer import NvidiaGpuModelProvidercorrectfrom smg_grpc_servicer import NvidiaGpuModelProviderAsyncio
Quickstart
import asyncio
from smg_grpc_servicer import NvidiaGpuModelProviderAsyncio
async def main():
provider = NvidiaGpuModelProviderAsyncio(
workers=[{"url": "localhost:50051", "disconnected_debounce_s": 5.0}],
request_interval=0.1,
inference_engine="sglang",
name="my-engine",
)
async with provider.run():
await asyncio.sleep(10)
asyncio.run(main())