MLServer

1.7.1 · active · verified Thu Apr 16

MLServer is an open-source inference server for machine learning models, designed to serve any ML framework through a standard V2 inference protocol. It aims to provide a lightweight and performant solution for deploying models and supports both REST and gRPC endpoints. The current version is 1.7.1, and it is actively developed and maintained by SeldonIO with a regular release cadence.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart defines a simple `MyModel` that doubles its input. Save the code as `model.py`, then use the `mlserver start .` command to run the server. An example `curl` command is provided to demonstrate how to send an inference request to the running server.

from mlserver import MLModel
from mlserver.types import InferenceRequest, InferenceResponse, ResponseOutput
import numpy as np

class MyModel(MLModel):
    async def load(self):
        # In a real scenario, load your model artifacts here
        self.model = lambda x: x * 2 # A simple dummy function
        self.ready = True

    async def predict(self, request: InferenceRequest) -> InferenceResponse:
        input_data = request.inputs[0].data.__root__
        input_array = np.array(input_data).astype(np.float32)
        output_array = self.model(input_array)
        
        return InferenceResponse(
            outputs=[
                ResponseOutput(
                    name="output-0",
                    shape=output_array.shape,
                    datatype="FP32",
                    data=output_array.tolist(),
                )
            ]
        )

# To run this model:
# 1. Save the above code as `model.py` in an empty directory.
# 2. Open your terminal in that directory.
# 3. Ensure mlserver and numpy are installed: `pip install mlserver numpy`
# 4. Run the MLServer: `mlserver start .`
#
# You can then send an inference request (e.g., using curl in a new terminal):
# curl -X POST 'http://localhost:8080/v2/models/MyModel/infer' \
#      -H 'Content-Type: application/json' \
#      -d '{
#            "inputs": [
#              {
#                "name": "input-0",
#                "shape": [1, 2],
#                "datatype": "FP32",
#                "data": [10.0, 20.0]
#              }
#            ]
#          }'

view raw JSON →