BentoML
BentoML is an open-source framework for building, shipping, and scaling AI applications. It allows developers to create production-ready API endpoints from machine learning models, bundle them into 'Bentos' (deployable archives), and serve them via a unified API server. It currently supports a wide range of ML frameworks and provides tools for model management, API orchestration, and deployment to various platforms. BentoML is actively maintained with frequent patch releases and regular minor version updates.
Common errors
-
ModuleNotFoundError: No module named 'bentoml'
cause BentoML library is not installed in the active Python environment.fixRun `pip install bentoml` to install the library. -
bentoml.exceptions.NotFound: Model 'your_model_name:latest' not found
cause The BentoML service or build command cannot find the specified model in the local model store.fixEnsure that `bentoml.models.save()` was executed successfully for 'your_model_name' and that the `bentoml.Service` constructor correctly references the model tag (e.g., `models=[bentoml.models.get('your_model_name')]`). Verify the model name and version are correct. -
TypeError: Object of type <PydanticModel> is not JSON serializable
cause You are trying to return a Pydantic model directly from an API endpoint declared with `output=JSON()` without proper serialization.fixConvert the Pydantic model to a dictionary before returning it (e.g., `return my_pydantic_model.dict()`) or ensure your Pydantic model implements a `json()` method if you intended to use that directly. -
RuntimeError: No event loop is running in current thread.
cause An asynchronous API endpoint (defined with `async def`) or an async operation is being called from a synchronous context or without a proper async event loop managed by AnyIO/asyncio.fixEnsure you are using `await` for async calls within async functions. If calling sync code from async, or vice-versa, ensure correct thread management or consider using `anyio.to_thread.run_sync` for blocking operations within async APIs, or `asyncio.run` for running top-level async functions. -
OperationalError: database is locked
cause Concurrent access attempts to the SQLite database used by BentoML for metadata storage, often occurring in high-concurrency scenarios or when `bentoml serve` is killed uncleanly.fixThis issue is often mitigated in newer BentoML versions with increased SQLite busy timeout and WAL mode. Ensure you are on a recent version (>=1.4.36). If it persists, ensure only one process accesses the database at a time or consider using a persistent model store backend for production.
Warnings
- breaking Major API overhaul in BentoML 1.0. The entire API was redesigned, making 0.x code incompatible with 1.x. Key changes include `BentoService` renamed to `Service`, removal of `bentoml.artifacts` and `bentoml.adapters`, and a new `bentoml.io` module for I/O handling.
- gotcha Model not found errors during `bentoml serve` or `bentoml build` indicate that the service cannot locate the specified model. This often happens if the model was not saved correctly or if the service's `models` list doesn't correctly reference it.
- gotcha Incorrect resource allocation (CPU, GPU workers) can lead to underutilization or over-provisioning. BentoML defaults to managing workers based on available resources, but for optimal performance, explicit configuration is often needed.
- gotcha When building a Bento, external Python dependencies not explicitly listed in `bentofile.yaml` or `requirements.txt` will be missing in the deployed environment, leading to `ModuleNotFoundError`.
Install
-
pip install bentoml -
pip install "bentoml[transformers]" # For Transformers models pip install "bentoml[pytorch]" # For PyTorch models
Imports
- Service
from bentoml import Service
- bentoml.io
from bentoml.io import JSON, NumpyNdarray, Image # Or other I/O types
- models.save
bentoml.pickler.save_model(...)
import bentoml bentoml.models.save(...)
- bentoml.BentoService
from bentoml import BentoService
from bentoml import Service
- bentoml.artifacts
from bentoml.adapters import DataframeInput
import bentoml # Access models via bentoml.models.get or by name in Service constructor
Quickstart
import bentoml
from bentoml.io import JSON
from pydantic import BaseModel
import os
# Define a Pydantic model for input data validation
class InputData(BaseModel):
name: str
age: int
# Define a simple prediction function to be served
def greeter_predict(input_data: InputData) -> dict:
return {"greeting": f"Hello, {input_data.name}! You are {input_data.age} years old."}
# Save a dummy 'greeter' model (in real apps, this would be a trained ML model)
# This model is essentially just the greeter_predict function itself for demonstration
# In a real scenario, you'd save a scikit-learn model, a PyTorch model, etc.
model_tag = bentoml.models.save(
name="greeter_model",
obj=greeter_predict, # Saving the function directly for this simple example
signatures={
"predict": {"batchable": False}
}
)
# Create a BentoML Service
# The 'models' argument tells BentoML which models this service depends on
svc = bentoml.Service(
name="greeter_service",
models=[model_tag] # Reference the saved model by its tag
)
# Define an API endpoint using the saved model
# The `input` and `output` decorators specify the data types for the API
@svc.api(input=JSON(pydantic_model=InputData), output=JSON())
async def greet(input_data: InputData) -> dict:
# Load the model from the BentoML model store
greeter_model = await bentoml.models.get(model_tag.name)
# Call the model's signature (in this case, our saved function)
result = greeter_model.predict(input_data)
return result
# To run this service locally:
# 1. Save the code above as `service.py`
# 2. Run in terminal: `bentoml serve service.py:svc --reload`
# 3. Access at http://localhost:3000/greet with a POST request, e.g.:
# curl -X POST -H "Content-Type: application/json" \
# -d '{"name": "Alice", "age": 30}' \
# http://localhost:3000/greet