{"id":8618,"library":"sagemaker-training","title":"SageMaker Training Toolkit","description":"The `sagemaker-training` library provides the core toolkit that runs inside Amazon SageMaker training containers. It handles downloading input data, parsing hyperparameters, executing user training scripts, and uploading model artifacts. It's currently at version 5.1.1 and has a relatively active release cadence, with minor versions released every few weeks/months and major versions less frequently.","status":"active","version":"5.1.1","language":"en","source_language":"en","source_url":"https://github.com/aws/sagemaker-training-toolkit/","tags":["aws","sagemaker","ml","machine-learning","container","training"],"install":[{"cmd":"pip install sagemaker-training","lang":"bash","label":"Install core library"}],"dependencies":[{"reason":"Required for internal data serialization; version updates (e.g., v5.0.0) can cause conflicts with other libraries.","package":"protobuf","optional":false},{"reason":"Used for interacting with AWS services like S3. While not a direct dependency of the core `sagemaker-training` package in all scenarios, version mismatches can cause runtime issues for functionality like S3 data transfer.","package":"boto3","optional":true}],"imports":[{"note":"Most common usage is to import the `environment` module to access training environment details.","wrong":"import sagemaker_training","symbol":"environment","correct":"from sagemaker_training import environment"},{"note":"The `get_environment` function is a member of the `environment` module.","wrong":"from sagemaker_training import get_environment","symbol":"get_environment","correct":"from sagemaker_training.environment import get_environment"},{"note":"Hyperparameters are accessed via the `environment` module.","wrong":"sagemaker_training.get_hyperparameters()","symbol":"get_hyperparameters","correct":"from sagemaker_training.environment import get_hyperparameters"}],"quickstart":{"code":"from sagemaker_training import environment\nimport os\n\ndef train():\n    # Get SageMaker training environment details\n    env = environment.get_environment()\n\n    # Access hyperparameters\n    hyperparameters = env.hyperparameters\n    learning_rate = hyperparameters.get('learning_rate', 0.01)\n\n    # Access input data paths\n    train_data_path = os.path.join(env.channel_input_dirs['training'], 'data.csv')\n\n    # Access model output path\n    model_dir = env.model_dir\n\n    print(f\"Learning Rate: {learning_rate}\")\n    print(f\"Training data path: {train_data_path}\")\n    print(f\"Model output directory: {model_dir}\")\n\n    # Your training logic here\n    # Example: Save a dummy model artifact\n    with open(os.path.join(model_dir, 'model.txt'), 'w') as f:\n        f.write('My trained model output')\n\nif __name__ == '__main__':\n    train()\n","lang":"python","description":"This quickstart demonstrates a typical SageMaker training script entry point. It uses `sagemaker_training.environment` to retrieve hyperparameters and input/output paths, which are crucial for running user code within a SageMaker training container. The script should be placed at the root of your training code archive."},"warnings":[{"fix":"Ensure all your dependencies, including those brought by SageMaker's base images, are compatible with `protobuf>=5.0.0`. You may need to pin specific versions of conflicting libraries or use a different base image if conflicts persist.","message":"Version 5.0.0 updated the required `protobuf` dependency to version 5.28.1. This can cause significant conflicts and runtime errors (`AttributeError`, `TypeError`) if your custom training environment or other dependencies rely on an older (v3 or v4) `protobuf` version.","severity":"breaking","affected_versions":">=5.0.0"},{"fix":"When testing locally, either mock the `sagemaker_training.environment` calls or ensure you set up dummy SageMaker environment variables (e.g., `SM_MODEL_DIR`, `SM_INPUT_DATA_CONFIG`, `SM_HYPERPARAMETERS`) for your test environment.","message":"The `sagemaker-training` library is designed to run *inside* the SageMaker training container. Trying to run scripts locally that heavily rely on `sagemaker_training.environment` calls without mocking or setting up the corresponding environment variables will result in errors (e.g., `KeyError` for missing environment variables or incorrect paths).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Regularly update `boto3` to the latest compatible version. If you encounter S3-related issues, verify that `boto3` versions are consistent across your environment and the toolkit, and consider explicitly pinning a compatible `boto3` version in your `requirements.txt`.","message":"Mismatched `boto3` versions between the SageMaker Training Toolkit and your custom code or base image can lead to issues with S3 interactions (e.g., downloading data, uploading model artifacts) or credential handling.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure your SageMaker estimator configuration or Dockerfile ENTRYPOINT/CMD correctly points to your training script. The `source_dir` argument in SageMaker SDK estimators handles this automatically for Python scripts.","message":"The training toolkit expects your user script to be at `/opt/ml/code/your_script.py` (or similar) within the container. Custom entrypoints or Dockerfiles that deviate from this structure without proper configuration can lead to `FileNotFoundError` or the script not being executed.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install sagemaker-training` in your environment. If using a custom Dockerfile, add `RUN pip install sagemaker-training`.","cause":"The `sagemaker-training` library is not installed in the environment where the script is being executed (e.g., local machine or custom Docker container without proper installation).","error":"ModuleNotFoundError: No module named 'sagemaker_training'"},{"fix":"Verify all `protobuf` installations. Ensure all libraries in your environment are compatible with `protobuf>=5.0.0`. You may need to upgrade other dependencies or explicitly pin `protobuf` to a specific v5+ version (e.g., `protobuf>=5.28.1`).","cause":"This error typically occurs when `protobuf` v5+ is installed, but another library (or an older version of `sagemaker-training` itself) expects an API from `protobuf` v3 or v4. This was a common issue after `sagemaker-training` v5.0.0's `protobuf` upgrade.","error":"AttributeError: module 'google.protobuf.descriptor' has no attribute '_HAS_OPTIONAL_FIELD_ACCESSORS'"},{"fix":"Always use `.get()` with a default value when accessing hyperparameters (e.g., `hyperparameters.get('your_hyperparameter', default_value)`). Double-check that the hyperparameter name passed to the SageMaker estimator matches the key used in your script.","cause":"Attempting to access a hyperparameter using `env.hyperparameters['YOUR_HYPERPARAMETER']` when that hyperparameter was not provided to the SageMaker training job.","error":"KeyError: 'SM_HP_YOUR_HYPERPARAMETER'"},{"fix":"Ensure your SageMaker estimator's `inputs` argument correctly maps your S3 data to the expected channel name (e.g., 'training'). Verify the file exists in your S3 bucket and that your script uses the correct path derived from `env.channel_input_dirs['channel_name']`.","cause":"The script is trying to access an input file or directory that does not exist at the specified path within the SageMaker container, or the input data channel was not correctly configured.","error":"FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/training/my_file.csv'"}]}