Matrice Inference Utilities
matrice-inference is a Python library providing common server utilities for Matrice.ai services, specifically designed for building and deploying machine learning inference services using FastAPI. It offers foundational classes and models for defining inference endpoints, handling requests, and managing service lifecycle. Currently at version 0.1.166, its pre-1.0 status suggests a rapid release cadence with potential API changes.
Common errors
-
ModuleNotFoundError: No module named 'matrice_inference.base_inference'
cause The `matrice-inference` package is not installed or the import path is incorrect.fixEnsure `matrice-inference` is installed in your active environment: `pip install matrice-inference`. Double-check the import statement for typos: `from matrice_inference.base_inference import BaseInferenceService`. -
TypeError: Can't instantiate abstract class MyService with abstract methods predict, warmup
cause You have subclassed `BaseInferenceService` but have not implemented one or both of the required abstract methods (`warmup` and `predict`) or have not defined them as `async`.fixImplement both `async def warmup(self):` and `async def predict(self, request: RequestType) -> ResponseType:` in your service class, ensuring they are `async` functions. -
pydantic.v2.errors.PydanticImportError: Pydantic V1 is installed, but Matrice Inference requires Pydantic V2.
cause Your Python environment has a conflicting version of Pydantic. `matrice-inference` specifically requires Pydantic v2 (e.g., `pydantic>=2.7.0,<2.8.0`).fixUpgrade Pydantic to the required v2 range: `pip install "pydantic>=2.7.0,<2.8.0" --upgrade`. If other packages prevent this, consider using a separate virtual environment or a dependency management tool to resolve conflicts. -
RuntimeError: Service not warmed up.
cause Your custom `predict` method in `MyService` was called before the `warmup` method completed, or before the service's `is_ready` flag was set to True. The `create_app` function automatically sets up a startup event to call `warmup`, but manual calls or improper `is_ready` logic can lead to this.fixEnsure your `warmup` method correctly sets the service's ready state (e.g., `self.is_ready = True`) and that `create_app` is used, allowing FastAPI to manage the startup lifecycle. Do not call `predict` manually before the service has fully started.
Warnings
- breaking As a pre-1.0 library (version 0.1.x), `matrice-inference` does not guarantee API stability between minor releases. Expect frequent breaking changes to class signatures, function parameters, or module structures without explicit warnings in a changelog.
- gotcha The `BaseInferenceService` is an abstract base class requiring all abstract methods (`warmup` and `predict`) to be implemented as `async` functions in your concrete service class. Forgetting `async` or not implementing a method will result in a `TypeError`.
- gotcha `matrice-inference` has strict Pydantic v2 dependencies (e.g., `pydantic>=2.7.0,<2.8.0`). If your project uses Pydantic v1 or an incompatible Pydantic v2 range due to other dependencies, you will encounter `PydanticImportError` or other runtime issues.
- gotcha This library is primarily an internal utility for Matrice.ai. Its documentation for external users is minimal, and some assumptions about the operational environment or pre-existing infrastructure might not be explicitly stated, leading to unexpected behavior in different setups.
Install
-
pip install matrice-inference
Imports
- BaseInferenceService
from matrice_inference.base_inference import BaseInferenceService
- create_app
from matrice_inference.api.app import create_app
- InferenceConfig
from matrice_inference.config import InferenceConfig
Quickstart
import uvicorn
from fastapi import FastAPI
from pydantic import BaseModel
from typing import Any, TypeVar, Generic, Awaitable
import asyncio # Required for async in warmup/predict
from matrice_inference.api.app import create_app
from matrice_inference.base_inference import BaseInferenceService
from matrice_inference.config import InferenceConfig
# Define your custom request and response models
class MyInferenceRequest(BaseModel):
text: str
upper_case: bool = False
class MyInferenceResponse(BaseModel):
processed_text: str
original_length: int
# Implement your inference service
class MyService(BaseInferenceService[MyInferenceRequest, MyInferenceResponse]):
def __init__(self, config: InferenceConfig):
super().__init__(config)
self.is_ready = False
print(f"Service '{config.service_name}' initialized.")
async def warmup(self):
"""Simulate loading a model."""
print("Warming up MyService...")
await asyncio.sleep(0.01) # Simulate async I/O
self.is_ready = True
print("MyService is ready.")
async def predict(self, request: MyInferenceRequest) -> MyInferenceResponse:
"""Perform actual inference."""
if not self.is_ready:
raise RuntimeError("Service not warmed up.")
processed_text = request.text
if request.upper_case:
processed_text = request.text.upper()
return MyInferenceResponse(
processed_text=processed_text,
original_length=len(request.text)
)
# Create a minimal configuration
inference_config = InferenceConfig(
service_name="MyUpperCaseService",
model_name="text_processor",
model_version="1.0.0"
)
# Instantiate your service
my_service = MyService(inference_config)
# Create the FastAPI application
app: FastAPI = create_app(
inference_service=my_service,
request_model=MyInferenceRequest,
response_model=MyInferenceResponse
)
# To run this, save as `main.py` and execute in your terminal:
# uvicorn main:app --host 0.0.0.0 --port 8000
# Then open http://localhost:8000/docs in your browser to test the API.