MLServer
MLServer is an open-source inference server for machine learning models, designed to serve any ML framework through a standard V2 inference protocol. It aims to provide a lightweight and performant solution for deploying models and supports both REST and gRPC endpoints. The current version is 1.7.1, and it is actively developed and maintained by SeldonIO with a regular release cadence.
Common errors
-
No MLModel class found in module 'model'
cause The `model.py` file does not define a class that inherits from `mlserver.MLModel`, or the class is not discoverable (e.g., typo in class name, wrong file path).fixEnsure your model class is named `MyModel` (or matches `MLSERVER_MODEL_NAME` env var) and correctly subclasses `mlserver.MLModel`. Verify `model.py` is in the directory you're running `mlserver start` from, or specify its path. -
RequestValidationError: 1 validation error for InferenceRequest
cause The incoming `InferenceRequest` JSON payload does not conform to the V2 Inference Protocol specification (e.g., missing required fields like `name`, `shape`, `datatype` for inputs, or incorrect data types/shapes).fixReview the structure of your `InferenceRequest` to ensure it precisely matches the V2 protocol. Pay close attention to the `inputs` array's contents and their types. -
ModuleNotFoundError: No module named 'mlserver_tensorflow'
cause You are attempting to serve a TensorFlow model, but the `mlserver-tensorflow` runtime library is not installed.fixInstall the specific runtime package for TensorFlow: `pip install mlserver-tensorflow`. -
Input '...' missing 'name' field
cause The V2 Inference Protocol mandates that all inputs and outputs have a 'name' field, which is missing in your request or model's response.fixAdd a unique `name` string to each `ResponseOutput` object within your `InferenceRequest` and `InferenceResponse` (e.g., `name='input-0'`, `name='output-0'`).
Warnings
- breaking The `predict` method signature in `MLModel` changed between 0.x and 1.x. The parameter `payload` was renamed to `request` for clarity.
- breaking MLServer's configuration (`settings.py`) and environment variable prefixes underwent significant changes in 1.x, simplifying the overall configuration schema.
- gotcha To serve models from specific frameworks (e.g., Scikit-learn, TensorFlow, XGBoost), you must install the corresponding MLServer runtime package (e.g., `mlserver-sklearn`, `mlserver-tensorflow`). The core `mlserver` package does not include these by default.
- gotcha MLServer strictly adheres to the V2 Inference Protocol. Inputs and outputs in `InferenceRequest` and `InferenceResponse` must correctly specify `name`, `shape`, and `datatype` fields, especially for custom models.
Install
-
pip install mlserver -
pip install mlserver[all]
Imports
- MLModel
from mlserver.model import MLModel
from mlserver import MLModel
- InferenceRequest
from mlserver.types import InferenceRequest
- InferenceResponse
from mlserver.types import InferenceResponse
- cli.main
from mlserver.cli import main
Quickstart
from mlserver import MLModel
from mlserver.types import InferenceRequest, InferenceResponse, ResponseOutput
import numpy as np
class MyModel(MLModel):
async def load(self):
# In a real scenario, load your model artifacts here
self.model = lambda x: x * 2 # A simple dummy function
self.ready = True
async def predict(self, request: InferenceRequest) -> InferenceResponse:
input_data = request.inputs[0].data.__root__
input_array = np.array(input_data).astype(np.float32)
output_array = self.model(input_array)
return InferenceResponse(
outputs=[
ResponseOutput(
name="output-0",
shape=output_array.shape,
datatype="FP32",
data=output_array.tolist(),
)
]
)
# To run this model:
# 1. Save the above code as `model.py` in an empty directory.
# 2. Open your terminal in that directory.
# 3. Ensure mlserver and numpy are installed: `pip install mlserver numpy`
# 4. Run the MLServer: `mlserver start .`
#
# You can then send an inference request (e.g., using curl in a new terminal):
# curl -X POST 'http://localhost:8080/v2/models/MyModel/infer' \
# -H 'Content-Type: application/json' \
# -d '{
# "inputs": [
# {
# "name": "input-0",
# "shape": [1, 2],
# "datatype": "FP32",
# "data": [10.0, 20.0]
# }
# ]
# }'