MLServer MLflow Runtime

1.7.1 · active · verified Thu Apr 16

mlserver-mlflow provides an MLflow runtime for MLServer, allowing users to serve models logged with MLflow using the MLServer inference server. It's currently at version 1.7.1 and maintains a release cadence aligned with MLServer's development, receiving updates for bug fixes and compatibility with new MLflow/MLServer versions.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to programmatically use `MLflowRuntime` to load an MLflow model and perform an inference. It first creates a dummy MLflow model and logs it locally, then uses its URI to instantiate `MLflowRuntime` within MLServer's `ModelSettings`, loads the model, and makes a prediction. The `asyncio.run(main())` block executes the asynchronous model loading and inference.

import os
import tempfile
import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
import asyncio
from mlserver_mlflow import MLflowRuntime
from mlserver.settings import ModelSettings
from mlserver.types import InferenceRequest, RequestInput

# 1. Create a dummy MLflow model and log it locally
#    (In a real scenario, this model would already be logged)
temp_dir = tempfile.TemporaryDirectory()
model_base_path = os.path.join(temp_dir.name, "mlflow_models")
mlflow.set_tracking_uri(f"file://{model_base_path}/mlruns")
with mlflow.start_run():
    model = LogisticRegression()
    model.fit(np.array([[0,0],[1,1]]), np.array([0,1]))
    mlflow.sklearn.log_model(model, "model_artifact")
    model_uri = f"file://{mlflow.active_run().info.artifact_uri}/model_artifact"

# 2. Instantiate and load MLflowRuntime
async def main():
    model_settings = ModelSettings(
        name="my-mlflow-model",
        implementation="mlserver_mlflow.MLflowRuntime",
        parameters={
            "uri": model_uri
        }
    )
    mlflow_runtime = MLflowRuntime(model_settings)
    await mlflow_runtime.load()

    # 3. Prepare and send inference request
    request_input = RequestInput(
        name="predict",
        shape=[1, 2],
        datatype="FP32",
        data=[[0.5, 0.5]]
    )
    inference_request = InferenceRequest(inputs=[request_input])

    response = await mlflow_runtime.predict(inference_request)
    print("Prediction:", response.outputs[0].data)

    await mlflow_runtime.unload()
    temp_dir.cleanup() # Clean up temporary model files

asyncio.run(main())

view raw JSON →