SageMaker Serve

1.7.1 · active · verified Sat Apr 11

SageMaker Serve is a modular component of the SageMaker Python SDK v3, designed to simplify model deployment and inference on Amazon SageMaker. It provides a modern, unified API, primarily through the `ModelBuilder` class, to streamline the process of taking trained machine learning models and creating real-time or batch inference endpoints. This version, 1.7.1, is part of the ongoing evolution of the SageMaker Python SDK, replacing legacy interfaces like Estimator.deploy(), Model, and Predictor with more intuitive and consolidated workflows. It is actively maintained as part of the broader SageMaker SDK.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `sagemaker-serve` to deploy a simple scikit-learn model to a SageMaker endpoint. It covers creating a dummy model, saving it, uploading to S3, defining an input/output schema with `SchemaBuilder`, initializing `ModelBuilder`, deploying the model, invoking the endpoint for a prediction, and critically, cleaning up the created SageMaker endpoint to avoid incurring unnecessary costs. For a real deployment, ensure a valid AWS IAM role and credentials are configured.

import os
import sagemaker
from sagemaker.serve.model_builder import ModelBuilder
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.utils.types import ModelServer
from sagemaker.core.helper.session_helper import Session, get_execution_role

import joblib
from sklearn.linear_model import LogisticRegression
import numpy as np
import tarfile

# 1. Setup Session and Role
try:
    sagemaker_session = Session()
    role = get_execution_role()
except ValueError:
    print("Could not get SageMaker execution role. Ensure you are running in a SageMaker environment or have AWS credentials configured.")
    # Fallback for local testing or CI without a full SageMaker environment
    aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID', 'DUMMY_KEY')
    aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY', 'DUMMY_SECRET')
    aws_session_token = os.environ.get('AWS_SESSION_TOKEN', '') # Optional
    aws_region = os.environ.get('AWS_REGION', 'us-east-1')

    # If actual credentials aren't available, we can't truly deploy but can simulate setup.
    # For a runnable example that deploys, real credentials/role are mandatory.
    # This part is mostly for local validation of the code structure.
    if aws_access_key_id == 'DUMMY_KEY':
        print("WARNING: Dummy AWS credentials are in use. Deployment will fail without valid credentials.")
    sagemaker_session = sagemaker.Session(
        boto_session=sagemaker.local.local_session().boto_session
    )
    # For a real deployment, 'role' must be a valid IAM role ARN.
    # This dummy role will fail actual deployment.
    role = "arn:aws:iam::123456789012:role/FakeSageMakerRole"

# 2. Train a dummy model and save it
x = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([0, 1, 0])
model = LogisticRegression()
model.fit(x, y)

model_filename = "model.joblib"
joblib.dump(model, model_filename)

# Create a tar.gz archive of the model
model_tar_path = "model.tar.gz"
with tarfile.open(model_tar_path, "w:gz") as tar:
    tar.add(model_filename)

# Upload model to S3 (requires valid role/credentials)
# For a real scenario, model_path would point to a pre-trained model in S3.
# For this quickstart, we'll try to upload the dummy model.
# If role is dummy, this S3 upload will fail but ModelBuilder setup can proceed conceptually.
try:
    s3_model_path = sagemaker_session.upload_data(path=model_tar_path, key_prefix="model-quickstart")
    print(f"Model uploaded to: {s3_model_path}")
except Exception as e:
    print(f"Could not upload model to S3: {e}. Using dummy path for illustration.")
    s3_model_path = "s3://your-bucket/path/to/model.tar.gz" # Placeholder

# 3. Define schema for input/output
schema_builder = SchemaBuilder(
    sample_input=np.array([[1.0, 2.0]], dtype=np.float32),
    sample_output=np.array([0], dtype=np.int64)
)

# 4. Initialize ModelBuilder
model_builder = ModelBuilder(
    model_path=s3_model_path,
    role=role,
    sagemaker_session=sagemaker_session,
    schema_builder=schema_builder,
    model_server=ModelServer.TENSORFLOW # Can be PyTorch, ONNX, etc.
)

# 5. Build and deploy model (requires actual AWS permissions)
# This step will create a SageMaker endpoint and will incur costs.
# Make sure to clean up.
endpoint_name = None
try:
    print("Building model...")
    built_model = model_builder.build()
    print("Deploying model...")
    endpoint = built_model.deploy(instance_type="ml.m5.large", initial_instance_count=1)
    endpoint_name = endpoint.endpoint_name
    print(f"Model deployed to endpoint: {endpoint_name}")

    # 6. Make a prediction
    test_data = np.array([[0.5, 0.3]], dtype=np.float32)
    prediction = endpoint.invoke(body=test_data.tolist())
    print(f"Prediction: {prediction}")

except Exception as e:
    print(f"Deployment or invocation failed: {e}")
    print("Ensure your AWS credentials and IAM role are correctly configured and have necessary permissions.")
    print("Also ensure your Sagemaker execution role has permissions to upload to S3.")
finally:
    # 7. Clean up (CRUCIAL for cost management)
    if endpoint_name:
        print(f"Deleting endpoint: {endpoint_name}")
        try:
            sagemaker_session.delete_endpoint(endpoint_name)
            print(f"Endpoint {endpoint_name} deleted.")
        except Exception as e:
            print(f"Failed to delete endpoint {endpoint_name}: {e}")
    else:
        print("No endpoint to delete or deployment failed.")

view raw JSON →