Matrice Inference Utilities

0.1.166 · active · verified Fri Apr 17

matrice-inference is a Python library providing common server utilities for Matrice.ai services, specifically designed for building and deploying machine learning inference services using FastAPI. It offers foundational classes and models for defining inference endpoints, handling requests, and managing service lifecycle. Currently at version 0.1.166, its pre-1.0 status suggests a rapid release cadence with potential API changes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a basic inference service using `matrice-inference`. It involves defining custom request/response models, implementing `BaseInferenceService` with `warmup` and `predict` methods, configuring the service, and finally using `create_app` to generate a runnable FastAPI application. Save the code as `main.py` and run it with `uvicorn main:app --host 0.0.0.0 --port 8000`. You can then interact with the generated API at `http://localhost:8000/docs`.

import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from typing import Any, TypeVar, Generic, Awaitable
import asyncio # Required for async in warmup/predict

from matrice_inference.api.app import create_app
from matrice_inference.base_inference import BaseInferenceService
from matrice_inference.config import InferenceConfig

# Define your custom request and response models
class MyInferenceRequest(BaseModel):
    text: str
    upper_case: bool = False

class MyInferenceResponse(BaseModel):
    processed_text: str
    original_length: int

# Implement your inference service
class MyService(BaseInferenceService[MyInferenceRequest, MyInferenceResponse]):
    def __init__(self, config: InferenceConfig):
        super().__init__(config)
        self.is_ready = False
        print(f"Service '{config.service_name}' initialized.")

    async def warmup(self):
        """Simulate loading a model."""
        print("Warming up MyService...")
        await asyncio.sleep(0.01) # Simulate async I/O
        self.is_ready = True
        print("MyService is ready.")

    async def predict(self, request: MyInferenceRequest) -> MyInferenceResponse:
        """Perform actual inference."""
        if not self.is_ready:
            raise RuntimeError("Service not warmed up.")

        processed_text = request.text
        if request.upper_case:
            processed_text = request.text.upper()

        return MyInferenceResponse(
            processed_text=processed_text,
            original_length=len(request.text)
        )

# Create a minimal configuration
inference_config = InferenceConfig(
    service_name="MyUpperCaseService",
    model_name="text_processor",
    model_version="1.0.0"
)

# Instantiate your service
my_service = MyService(inference_config)

# Create the FastAPI application
app: FastAPI = create_app(
    inference_service=my_service,
    request_model=MyInferenceRequest,
    response_model=MyInferenceResponse
)

# To run this, save as `main.py` and execute in your terminal:
# uvicorn main:app --host 0.0.0.0 --port 8000
# Then open http://localhost:8000/docs in your browser to test the API.

view raw JSON →