Inference Schema
The `inference-schema` package provides a uniform schema definition for common machine learning applications, specifically designed to aid in web-based ML prediction services. It offers decorators (`@input_schema`, `@output_schema`) that automatically validate and serialize input/output data based on user-defined schemas, integrating well with web frameworks. The current version is 1.8, and it sees periodic updates, often tied to dependency version bumps or feature additions for ML deployments.
Common errors
-
marshmallow.exceptions.ValidationError: {'field_name': ['Invalid type.']}cause The input data type for a specific field did not match the type inferred from the `sample_input` schema.fixVerify that the data types in your actual input data (e.g., float vs. int, string vs. number) precisely match the types present in your `sample_input` DataFrame/dictionary. -
AttributeError: 'dict' object has no attribute 'tolist'
cause The decorated function returned a dictionary, but the `output_schema` implied that a Pandas DataFrame was expected, or vice-versa, leading to an incompatible method call during serialization.fixEnsure the function's return value strictly conforms to the structure implied by the `sample_output` provided to `@output_schema`. If `output_schema` expects a list of numbers, convert your DataFrame column to a list using `.tolist()`. -
TypeError: Object of type 'DataFrame' is not JSON serializable
cause This error typically occurs when a web framework tries to serialize the output of your decorated function (which might be a `pandas.DataFrame`) directly to JSON, but the `output_schema` hasn't fully transformed it into a JSON-compatible type.fixEnsure your `output_schema` (the `sample_output` dictionary/list) defines a structure that is inherently JSON-serializable (e.g., nested dictionaries and lists of primitive types). If your function returns a `DataFrame`, make sure the output schema forces its conversion to a list of dicts or similar.
Warnings
- gotcha The `sample_input` and `sample_output` provided to the decorators are critical. They define the *structure and data types* of the expected input and output, not just placeholder values. Mismatches between the actual data at runtime and these samples will cause schema validation errors.
- breaking Inference-schema pins its core dependency, `marshmallow`, to specific version ranges (e.g., `<3.18.0` for v1.8). If your project uses a different `marshmallow` version, it can lead to dependency conflicts or unexpected validation behavior.
- gotcha When using `PandasParameterType`, the decorated function is expected to receive a `pandas.DataFrame` object. If you directly call the function with a different type (e.g., a dictionary or list) without it being processed by the schema, it will likely fail.
Install
-
pip install inference-schema
Imports
- input_schema
from inference_schema.schema_decorators import input_schema
- output_schema
from inference_schema.schema_decorators import output_schema
- PandasParameterType
from inference_schema.parameter_types import PandasParameterType
- NumpyParameterType
from inference_schema.parameter_types import NumpyParameterType
Quickstart
from inference_schema.schema_decorators import input_schema, output_schema
from inference_schema.parameter_types import PandasParameterType
import pandas as pd
import json
# Define sample input and output data structures
# These samples are used to infer the schema for validation and serialization
sample_input_df = pd.DataFrame({'feature1': [10.0, 20.0], 'feature2': [30.0, 40.0]})
sample_output_dict = {'prediction': [40.0, 60.0]}
@input_schema(PandasParameterType(sample_input_df))
@output_schema(sample_output_dict)
def predict(input_data: pd.DataFrame) -> dict:
"""
A dummy prediction function that takes a DataFrame and returns a dictionary.
The decorators handle validation of `input_data` and serialization of the return value.
"""
# Example prediction logic: sum of features
predictions = (input_data['feature1'] + input_data['feature2']).tolist()
return {'prediction': predictions}
# --- Example Usage ---
# This is how you'd typically call it, with input that matches the schema
input_for_prediction = pd.DataFrame({'feature1': [5.0, 15.0], 'feature2': [25.0, 35.0]})
result = predict(input_for_prediction)
print(f"Predicted result: {result}")
# If used in a web service, the input might come as JSON and be deserialized
# and validated into a DataFrame before reaching `predict` function.
# Example: raw_json_input = '{"feature1": [5.0, 15.0], "feature2": [25.0, 35.0]}'
# (framework would parse, inference-schema would validate/convert)