{"id":3795,"library":"sagemaker-serve","title":"SageMaker Serve","description":"SageMaker Serve is a modular component of the SageMaker Python SDK v3, designed to simplify model deployment and inference on Amazon SageMaker. It provides a modern, unified API, primarily through the `ModelBuilder` class, to streamline the process of taking trained machine learning models and creating real-time or batch inference endpoints. This version, 1.7.1, is part of the ongoing evolution of the SageMaker Python SDK, replacing legacy interfaces like Estimator.deploy(), Model, and Predictor with more intuitive and consolidated workflows. It is actively maintained as part of the broader SageMaker SDK.","status":"active","version":"1.7.1","language":"en","source_language":"en","source_url":"https://github.com/aws/sagemaker-python-sdk","tags":["aws","sagemaker","mlops","model serving","inference","cloud","machine learning","deployment"],"install":[{"cmd":"pip install sagemaker-serve","lang":"bash","label":"Install only sagemaker-serve"},{"cmd":"pip install sagemaker","lang":"bash","label":"Install full SageMaker Python SDK (includes sagemaker-serve)"}],"dependencies":[{"reason":"Required Python version","package":"python","optional":false},{"reason":"Underlying SageMaker SDK for low-level resource management","package":"sagemaker-core","optional":false}],"imports":[{"symbol":"ModelBuilder","correct":"from sagemaker.serve.model_builder import ModelBuilder"},{"note":"Used for defining model input/output schemas.","symbol":"SchemaBuilder","correct":"from sagemaker.serve.builder.schema_builder import SchemaBuilder"},{"note":"Used for specifying the desired model server (e.g., TorchServe, Triton).","symbol":"ModelServer","correct":"from sagemaker.serve.utils.types import ModelServer"},{"note":"Estimator, Model, and Predictor are legacy interfaces largely replaced by ModelTrainer and ModelBuilder in SageMaker Python SDK v3.","wrong":"from sagemaker.estimator import Estimator","symbol":"Estimator","correct":"N/A (Use ModelTrainer or ModelBuilder in V3)"}],"quickstart":{"code":"import os\nimport sagemaker\nfrom sagemaker.serve.model_builder import ModelBuilder\nfrom sagemaker.serve.builder.schema_builder import SchemaBuilder\nfrom sagemaker.serve.utils.types import ModelServer\nfrom sagemaker.core.helper.session_helper import Session, get_execution_role\n\nimport joblib\nfrom sklearn.linear_model import LogisticRegression\nimport numpy as np\nimport tarfile\n\n# 1. Setup Session and Role\ntry:\n    sagemaker_session = Session()\n    role = get_execution_role()\nexcept ValueError:\n    print(\"Could not get SageMaker execution role. Ensure you are running in a SageMaker environment or have AWS credentials configured.\")\n    # Fallback for local testing or CI without a full SageMaker environment\n    aws_access_key_id = os.environ.get('AWS_ACCESS_KEY_ID', 'DUMMY_KEY')\n    aws_secret_access_key = os.environ.get('AWS_SECRET_ACCESS_KEY', 'DUMMY_SECRET')\n    aws_session_token = os.environ.get('AWS_SESSION_TOKEN', '') # Optional\n    aws_region = os.environ.get('AWS_REGION', 'us-east-1')\n\n    # If actual credentials aren't available, we can't truly deploy but can simulate setup.\n    # For a runnable example that deploys, real credentials/role are mandatory.\n    # This part is mostly for local validation of the code structure.\n    if aws_access_key_id == 'DUMMY_KEY':\n        print(\"WARNING: Dummy AWS credentials are in use. Deployment will fail without valid credentials.\")\n    sagemaker_session = sagemaker.Session(\n        boto_session=sagemaker.local.local_session().boto_session\n    )\n    # For a real deployment, 'role' must be a valid IAM role ARN.\n    # This dummy role will fail actual deployment.\n    role = \"arn:aws:iam::123456789012:role/FakeSageMakerRole\"\n\n# 2. Train a dummy model and save it\nx = np.array([[1, 2], [3, 4], [5, 6]])\ny = np.array([0, 1, 0])\nmodel = LogisticRegression()\nmodel.fit(x, y)\n\nmodel_filename = \"model.joblib\"\njoblib.dump(model, model_filename)\n\n# Create a tar.gz archive of the model\nmodel_tar_path = \"model.tar.gz\"\nwith tarfile.open(model_tar_path, \"w:gz\") as tar:\n    tar.add(model_filename)\n\n# Upload model to S3 (requires valid role/credentials)\n# For a real scenario, model_path would point to a pre-trained model in S3.\n# For this quickstart, we'll try to upload the dummy model.\n# If role is dummy, this S3 upload will fail but ModelBuilder setup can proceed conceptually.\ntry:\n    s3_model_path = sagemaker_session.upload_data(path=model_tar_path, key_prefix=\"model-quickstart\")\n    print(f\"Model uploaded to: {s3_model_path}\")\nexcept Exception as e:\n    print(f\"Could not upload model to S3: {e}. Using dummy path for illustration.\")\n    s3_model_path = \"s3://your-bucket/path/to/model.tar.gz\" # Placeholder\n\n# 3. Define schema for input/output\nschema_builder = SchemaBuilder(\n    sample_input=np.array([[1.0, 2.0]], dtype=np.float32),\n    sample_output=np.array([0], dtype=np.int64)\n)\n\n# 4. Initialize ModelBuilder\nmodel_builder = ModelBuilder(\n    model_path=s3_model_path,\n    role=role,\n    sagemaker_session=sagemaker_session,\n    schema_builder=schema_builder,\n    model_server=ModelServer.TENSORFLOW # Can be PyTorch, ONNX, etc.\n)\n\n# 5. Build and deploy model (requires actual AWS permissions)\n# This step will create a SageMaker endpoint and will incur costs.\n# Make sure to clean up.\nendpoint_name = None\ntry:\n    print(\"Building model...\")\n    built_model = model_builder.build()\n    print(\"Deploying model...\")\n    endpoint = built_model.deploy(instance_type=\"ml.m5.large\", initial_instance_count=1)\n    endpoint_name = endpoint.endpoint_name\n    print(f\"Model deployed to endpoint: {endpoint_name}\")\n\n    # 6. Make a prediction\n    test_data = np.array([[0.5, 0.3]], dtype=np.float32)\n    prediction = endpoint.invoke(body=test_data.tolist())\n    print(f\"Prediction: {prediction}\")\n\nexcept Exception as e:\n    print(f\"Deployment or invocation failed: {e}\")\n    print(\"Ensure your AWS credentials and IAM role are correctly configured and have necessary permissions.\")\n    print(\"Also ensure your Sagemaker execution role has permissions to upload to S3.\")\nfinally:\n    # 7. Clean up (CRUCIAL for cost management)\n    if endpoint_name:\n        print(f\"Deleting endpoint: {endpoint_name}\")\n        try:\n            sagemaker_session.delete_endpoint(endpoint_name)\n            print(f\"Endpoint {endpoint_name} deleted.\")\n        except Exception as e:\n            print(f\"Failed to delete endpoint {endpoint_name}: {e}\")\n    else:\n        print(\"No endpoint to delete or deployment failed.\")\n","lang":"python","description":"This quickstart demonstrates how to use `sagemaker-serve` to deploy a simple scikit-learn model to a SageMaker endpoint. It covers creating a dummy model, saving it, uploading to S3, defining an input/output schema with `SchemaBuilder`, initializing `ModelBuilder`, deploying the model, invoking the endpoint for a prediction, and critically, cleaning up the created SageMaker endpoint to avoid incurring unnecessary costs. For a real deployment, ensure a valid AWS IAM role and credentials are configured."},"warnings":[{"fix":"Migrate your code to use the new V3 classes, particularly `sagemaker.serve.model_builder.ModelBuilder` for model deployment and inference. Refer to the SageMaker Python SDK V3 migration guide.","message":"SageMaker Python SDK V3 (which `sagemaker-serve` is part of) introduces significant breaking changes from V2. Legacy interfaces like `Estimator`, `Model`, and `Predictor` are replaced by unified classes such as `ModelTrainer` (for training) and `ModelBuilder` (for serving/inference).","severity":"breaking","affected_versions":"All versions of SageMaker Python SDK V3 and above (including sagemaker-serve 1.x.x)"},{"fix":"Ensure that your model artifacts are correctly packaged (e.g., as a `.tar.gz` file) and that their S3 paths do not contain such problematic 'folder' objects. SageMaker expects a specific structure or a single archive file.","message":"When storing model artifacts in S3 for deployment, avoid organizing them in the S3 console using folders that create 0-byte objects with keys ending in a slash (/). This can violate SageMaker's restrictions on model artifact file names and lead to deployment failures.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your Dockerfile copies your serving script(s) into the container (e.g., `/opt/program/serve`) and makes them executable (`chmod +x /opt/program/serve`). The entry point or command in your Dockerfile should correctly invoke this script.","message":"If you are using custom Docker containers for inference with SageMaker, the `serve` executable (or your custom serving entrypoint script) must be included within your Docker container and its path correctly configured in the container's `PATH` environment variable. SageMaker does not automatically create this executable.","severity":"gotcha","affected_versions":"All versions when using custom containers"},{"fix":"Verify that the IAM role associated with your SageMaker notebook, training job, or deployment has appropriate policies attached, granting `s3:GetObject`, `s3:PutObject`, `s3:ListBucket`, `cloudwatch:PutMetricData`, and `logs:CreateLogGroup`/`logs:PutLogEvents` permissions, among others, relevant to your workflow.","message":"Common deployment failures often stem from insufficient AWS IAM permissions. Errors like `AccessDenied` or `UnauthorizedOperation` indicate that the IAM role used by your SageMaker execution environment lacks necessary permissions for S3 (e.g., to access model artifacts or store output) or CloudWatch (for logging).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always ensure to call `endpoint.delete_endpoint()` or use the AWS Console/CLI to terminate SageMaker endpoints once they are no longer needed. Integrate cleanup into your development and CI/CD workflows.","message":"SageMaker endpoints, once deployed, incur costs based on the instance type and duration. Forgetting to delete an endpoint after testing or use can lead to unexpected charges on your AWS bill.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}