SageMaker Serve
SageMaker Serve is a modular component of the SageMaker Python SDK v3, designed to simplify model deployment and inference on Amazon SageMaker. It provides a modern, unified API, primarily through the `ModelBuilder` class, to streamline the process of taking trained machine learning models and creating real-time or batch inference endpoints. This version, 1.7.1, is part of the ongoing evolution of the SageMaker Python SDK, replacing legacy interfaces like Estimator.deploy(), Model, and Predictor with more intuitive and consolidated workflows. It is actively maintained as part of the broader SageMaker SDK.
Warnings
- breaking SageMaker Python SDK V3 (which `sagemaker-serve` is part of) introduces significant breaking changes from V2. Legacy interfaces like `Estimator`, `Model`, and `Predictor` are replaced by unified classes such as `ModelTrainer` (for training) and `ModelBuilder` (for serving/inference).
- gotcha When storing model artifacts in S3 for deployment, avoid organizing them in the S3 console using folders that create 0-byte objects with keys ending in a slash (/). This can violate SageMaker's restrictions on model artifact file names and lead to deployment failures.
- gotcha If you are using custom Docker containers for inference with SageMaker, the `serve` executable (or your custom serving entrypoint script) must be included within your Docker container and its path correctly configured in the container's `PATH` environment variable. SageMaker does not automatically create this executable.
- gotcha Common deployment failures often stem from insufficient AWS IAM permissions. Errors like `AccessDenied` or `UnauthorizedOperation` indicate that the IAM role used by your SageMaker execution environment lacks necessary permissions for S3 (e.g., to access model artifacts or store output) or CloudWatch (for logging).
- gotcha SageMaker endpoints, once deployed, incur costs based on the instance type and duration. Forgetting to delete an endpoint after testing or use can lead to unexpected charges on your AWS bill.
Install
-
pip install sagemaker-serve -
pip install sagemaker
Imports
- ModelBuilder
from sagemaker.serve.model_builder import ModelBuilder
- SchemaBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
- ModelServer
from sagemaker.serve.utils.types import ModelServer
- Estimator
N/A (Use ModelTrainer or ModelBuilder in V3)
Quickstart
import os
import sagemaker
from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.utils.types import ModelServer
from sagemaker.core.helper.session_helper import Session, get_execution_role
import joblib
from sklearn.linear_model import LogisticRegression
import numpy as np
import tarfile
# 1. Setup Session and Role
try:
sagemaker_session = Session()
role = get_execution_role()
except ValueError:
print("Could not get SageMaker execution role. Ensure you are running in a SageMaker environment or have AWS credentials configured.")
# Fallback for local testing or CI without a full SageMaker environment
aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID', 'DUMMY_KEY')
aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY', 'DUMMY_SECRET')
aws_session_token = os.environ.get('AWS_SESSION_TOKEN', '') # Optional
aws_region = os.environ.get('AWS_REGION', 'us-east-1')
# If actual credentials aren't available, we can't truly deploy but can simulate setup.
# For a runnable example that deploys, real credentials/role are mandatory.
# This part is mostly for local validation of the code structure.
if aws_access_key_id == 'DUMMY_KEY':
print("WARNING: Dummy AWS credentials are in use. Deployment will fail without valid credentials.")
sagemaker_session = sagemaker.Session(
boto_session=sagemaker.local.local_session().boto_session
)
# For a real deployment, 'role' must be a valid IAM role ARN.
# This dummy role will fail actual deployment.
role = "arn:aws:iam::123456789012:role/FakeSageMakerRole"
# 2. Train a dummy model and save it
x = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([0, 1, 0])
model = LogisticRegression()
model.fit(x, y)
model_filename = "model.joblib"
joblib.dump(model, model_filename)
# Create a tar.gz archive of the model
model_tar_path = "model.tar.gz"
with tarfile.open(model_tar_path, "w:gz") as tar:
tar.add(model_filename)
# Upload model to S3 (requires valid role/credentials)
# For a real scenario, model_path would point to a pre-trained model in S3.
# For this quickstart, we'll try to upload the dummy model.
# If role is dummy, this S3 upload will fail but ModelBuilder setup can proceed conceptually.
try:
s3_model_path = sagemaker_session.upload_data(path=model_tar_path, key_prefix="model-quickstart")
print(f"Model uploaded to: {s3_model_path}")
except Exception as e:
print(f"Could not upload model to S3: {e}. Using dummy path for illustration.")
s3_model_path = "s3://your-bucket/path/to/model.tar.gz" # Placeholder
# 3. Define schema for input/output
schema_builder = SchemaBuilder(
sample_input=np.array([[1.0, 2.0]], dtype=np.float32),
sample_output=np.array([0], dtype=np.int64)
)
# 4. Initialize ModelBuilder
model_builder = ModelBuilder(
model_path=s3_model_path,
role=role,
sagemaker_session=sagemaker_session,
schema_builder=schema_builder,
model_server=ModelServer.TENSORFLOW # Can be PyTorch, ONNX, etc.
)
# 5. Build and deploy model (requires actual AWS permissions)
# This step will create a SageMaker endpoint and will incur costs.
# Make sure to clean up.
endpoint_name = None
try:
print("Building model...")
built_model = model_builder.build()
print("Deploying model...")
endpoint = built_model.deploy(instance_type="ml.m5.large", initial_instance_count=1)
endpoint_name = endpoint.endpoint_name
print(f"Model deployed to endpoint: {endpoint_name}")
# 6. Make a prediction
test_data = np.array([[0.5, 0.3]], dtype=np.float32)
prediction = endpoint.invoke(body=test_data.tolist())
print(f"Prediction: {prediction}")
except Exception as e:
print(f"Deployment or invocation failed: {e}")
print("Ensure your AWS credentials and IAM role are correctly configured and have necessary permissions.")
print("Also ensure your Sagemaker execution role has permissions to upload to S3.")
finally:
# 7. Clean up (CRUCIAL for cost management)
if endpoint_name:
print(f"Deleting endpoint: {endpoint_name}")
try:
sagemaker_session.delete_endpoint(endpoint_name)
print(f"Endpoint {endpoint_name} deleted.")
except Exception as e:
print(f"Failed to delete endpoint {endpoint_name}: {e}")
else:
print("No endpoint to delete or deployment failed.")