{"id":8617,"library":"sagemaker-inference","title":"SageMaker Inference Toolkit","description":"The sagemaker-inference toolkit is an open-source Python library designed to simplify the creation of serving containers for machine learning models on Amazon SageMaker. It provides a model serving stack built on Multi Model Server (MMS), enabling users to easily implement custom inference logic. The current version is 1.10.1, with a regular release cadence addressing bug fixes and new features, including support for newer Python versions and improved dependency management.","status":"active","version":"1.10.1","language":"en","source_language":"en","source_url":"https://github.com/aws/sagemaker-inference-toolkit/","tags":["aws","sagemaker","inference","machine learning","mms","container"],"install":[{"cmd":"pip install sagemaker-inference","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"The inference toolkit's serving stack is built on Multi Model Server (MMS). While not a direct Python dependency for `sagemaker-inference` itself, MMS must be installed in the Docker container for the toolkit to function.","package":"multi-model-server","optional":false},{"reason":"Version 1.9.1 relaxed the dependency on 'retrying', indicating it's used internally for robustness, particularly in starting the model server.","package":"retrying","optional":true}],"imports":[{"note":"Base class for custom inference handlers, commonly extended to implement `model_fn`, `input_fn`, `predict_fn`, and `output_fn`.","symbol":"DefaultInferenceHandler","correct":"from sagemaker_inference.default_inference_handler import DefaultInferenceHandler"},{"note":"Used to start the underlying model server within the container's entrypoint.","symbol":"model_server","correct":"from sagemaker_inference import model_server"},{"note":"Used in the handler service to wrap the custom inference handler.","symbol":"Transformer","correct":"from sagemaker_inference.transformer import Transformer"},{"note":"Base class for the handler service, which orchestrates the model server and the inference handler.","symbol":"DefaultHandlerService","correct":"from sagemaker_inference.default_handler_service import DefaultHandlerService"}],"quickstart":{"code":"import os\nimport json\n\nfrom sagemaker_inference.default_inference_handler import DefaultInferenceHandler\nfrom sagemaker_inference import content_types, decoder, encoder\n\nclass CustomInferenceHandler(DefaultInferenceHandler):\n    def default_model_fn(self, model_dir, context=None):\n        \"\"\"Loads a dummy model for demonstration. In a real scenario, this would load\n        your actual trained model from `model_dir`.\n        \"\"\"\n        print(f\"Loading model from: {model_dir}\")\n        # Simulate loading a model artifact\n        # For example, if you had a 'model.pkl' in model_dir\n        # model_path = os.path.join(model_dir, 'model.pkl')\n        # model = joblib.load(model_path)\n        return {\"status\": \"model_loaded\", \"path\": model_dir}\n\n    def default_input_fn(self, input_data, content_type, context=None):\n        \"\"\"Deserializes the input data from the request. Supports JSON and CSV.\n        \"\"\"\n        if content_type == content_types.JSON:\n            return decoder.decode(input_data, content_type)\n        elif content_type == content_types.CSV:\n            # Assuming CSV is a simple string for this example\n            return input_data.decode('utf-8').split(',')\n        else:\n            raise ValueError(f\"Unsupported content type: {content_type}\")\n\n    def default_predict_fn(self, data, model, context=None):\n        \"\"\"Makes a dummy prediction based on the input data and the loaded model.\n        \"\"\"\n        print(f\"Performing prediction with model: {model} and data: {data}\")\n        if isinstance(data, dict) and 'instances' in data:\n            # Assume a common inference request format\n            predictions = [item * 2 for item in data['instances']]\n        elif isinstance(data, list):\n            predictions = [item + \"_processed\" for item in data]\n        else:\n            predictions = f\"Processed: {data}\"\n        return {\"predictions\": predictions}\n\n    def default_output_fn(self, prediction, accept, context=None):\n        \"\"\"Serializes the prediction result to the requested accept type.\n        Supports JSON.\n        \"\"\"\n        if accept == content_types.JSON:\n            return encoder.encode(prediction, accept)\n        else:\n            raise ValueError(f\"Unsupported accept type: {accept}\")\n\n# To run this in a SageMaker container, you would have a Dockerfile\n# that installs sagemaker-inference and multi-model-server, copies this file\n# as 'inference.py' and sets up the entrypoint to start the model server.\n# e.g., using sagemaker_inference.model_server.start_model_server()\n\n# Example of how to manually test the handler (not typically run directly in a quickstart)\nif __name__ == '__main__':\n    handler = CustomInferenceHandler()\n    model = handler.default_model_fn('/opt/ml/model') # Simulates model_dir\n\n    test_json_input = json.dumps({\"instances\": [1, 2, 3]}).encode('utf-8')\n    json_data = handler.default_input_fn(test_json_input, content_types.JSON)\n    json_prediction = handler.default_predict_fn(json_data, model)\n    json_output = handler.default_output_fn(json_prediction, content_types.JSON)\n    print(f\"JSON Inference Result: {json_output.decode('utf-8')}\")\n\n    test_csv_input = b'hello,world'\n    csv_data = handler.default_input_fn(test_csv_input, content_types.CSV)\n    csv_prediction = handler.default_predict_fn(csv_data, model)\n    csv_output = handler.default_output_fn(csv_prediction, content_types.JSON) # Output as JSON for simplicity\n    print(f\"CSV Inference Result: {csv_output.decode('utf-8')}\")\n","lang":"python","description":"This quickstart demonstrates the core pattern for using `sagemaker-inference` to create a custom inference handler. It defines a `CustomInferenceHandler` class that extends `DefaultInferenceHandler`, overriding `model_fn`, `input_fn`, `predict_fn`, and `output_fn`. These functions are responsible for loading the model, deserializing input, making predictions, and serializing output, respectively. This file would typically be part of your model archive within a SageMaker custom container."},"warnings":[{"fix":"Review your custom inference handler function signatures. If `context` is not used, you can omit it. If you wish to use it, ensure your signatures include `context=None` as the last parameter (e.g., `def model_fn(model_dir, context=None):`).","message":"Handler functions (`model_fn`, `input_fn`, `predict_fn`, `output_fn`) were updated in v1.7.0 to optionally accept a `context` object. If you were using older versions and had strict function signatures without `context`, this update might require changes, though omitting `context` from the declaration is still supported if not needed.","severity":"breaking","affected_versions":">=1.7.0"},{"fix":"Upgrade `sagemaker-inference` to version 1.10.1 or higher, as a fix for this zombie process exception was included. Also, ensure your base Docker image and `psutil` version are compatible and updated.","message":"Persistent 'psutil.ZombieProcess: PID still exists but it's a zombie' errors can occur, leading to endpoint instability or restarts. This was a known issue with specific `psutil` versions and PyTorch inference images.","severity":"gotcha","affected_versions":"<1.10.1"},{"fix":"Add `RUN pip install multi-model-server sagemaker-inference` to your Dockerfile. Also, ensure your container exposes port 8080 and handles `/ping` and `/invocations` routes.","message":"When using custom Docker containers with `sagemaker-inference`, the `multi-model-server` (MMS) must be explicitly installed within your Dockerfile. Forgetting this can lead to the model server failing to start.","severity":"gotcha","affected_versions":"All versions"},{"fix":"For `sagemaker-inference` versions 1.10.0 and above, configure CodeArtifact access using specific environment variables as detailed in AWS documentation. For older versions or other private repos, you might need to manually configure `pip` with `--extra-index-url` or package dependencies.","message":"Custom Python dependencies specified in `requirements.txt` might fail to install if they are hosted in a private repository like AWS CodeArtifact without proper configuration.","severity":"gotcha","affected_versions":"<1.10.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Update `sagemaker-inference` to version 1.10.1 or newer. If using PyTorch, ensure your PyTorch inference DLC version is recent enough to include the fix.","cause":"A race condition or issue in process monitoring within the model server, often related to the `psutil` library, causing the inference process to incorrectly identify a running process as a zombie. This was particularly prevalent in certain PyTorch inference containers.","error":"psutil.ZombieProcess: PID still exists but it's a zombie"},{"fix":"Increase the `SAGEMAKER_MAX_REQUEST_SIZE` environment variable for your endpoint. If the payload is small, check `SAGEMAKER_MODEL_SERVER_TIMEOUT_SECONDS` (introduced in v1.9.3) and `InvocationTimeoutSeconds` on the endpoint, and increase worker count if necessary. [cite: GitHub releases, 24]","cause":"This error (HTTP 413 Payload Too Large) typically indicates that the inference request payload size exceeds the server's configured limit, or less commonly, that a worker timeout occurred if the payload is within limits.","error":"ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (413) from primary and could not load the entire response body"},{"fix":"Ensure `pip install sagemaker-inference` is executed in your Dockerfile (for containers) or local development environment. If using a custom Dockerfile, verify the `RUN pip install ...` command is correctly placed before your Python code runs.","cause":"The `sagemaker-inference` library is not installed in the Python environment of your SageMaker container or local setup.","error":"ModuleNotFoundError: No module named 'sagemaker_inference'"}]}