{"library":"sagemaker-datawrangler","title":"Amazon SageMaker Data Wrangler Library","description":"Amazon SageMaker Data Wrangler is a feature within Amazon SageMaker Studio Classic (and now integrated into SageMaker Canvas) that provides a visual interface for end-to-end data preparation for machine learning. It allows users to import, prepare, transform, featurize, and analyze data with little to no coding, offering over 300 built-in transformations. Users build 'data flows' graphically, which can then be exported as Python code, SageMaker Pipelines, or Data Wrangler jobs for automated ML workflows. The current PyPI version is 0.4.3, and it typically releases new features and updates aligned with SageMaker Studio and Canvas releases.","language":"python","status":"active","last_verified":"Fri May 15","install":{"commands":["pip install sagemaker-datawrangler"],"cli":null},"imports":["from sagemaker.processing import Processor\nfrom sagemaker.processing import ScriptProcessor\n# Or, if using the high-level Data Wrangler specific processor (depends on export method)\n# from sagemaker.analytics import DataWranglerProcessor"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import os\nfrom sagemaker.estimator import Estimator\nfrom sagemaker.processing import Processor, ScriptProcessor\nfrom sagemaker.s3 import S3Uploader\n\n# This quickstart demonstrates how to execute an *exported* Data Wrangler flow.\n# Data Wrangler flows are typically created and exported from the SageMaker Studio UI.\n# The 'flow_file.flow' and 'transformation_script.py' are hypothetical outputs from a Data Wrangler export.\n\n# Set up S3 bucket for input/output and flow file\nbucket = os.environ.get('SAGEMAKER_BUCKET', 'your-sagemaker-default-bucket') # Replace with your S3 bucket\nrole = os.environ.get('SAGEMAKER_ROLE', 'arn:aws:iam::123456789012:role/SageMakerExecutionRole') # Replace with your SageMaker execution role\n\nflow_file_s3_uri = S3Uploader.upload('path/to/local/flow_file.flow', f's3://{bucket}/data-wrangler-flows/')\ninput_data_s3_uri = S3Uploader.upload('path/to/local/input_data.csv', f's3://{bucket}/data-wrangler-inputs/')\noutput_data_s3_uri = f's3://{bucket}/data-wrangler-outputs/'\n\n# Option 1: Run a Data Wrangler .flow file directly as a Processing Job\n# This requires a Data Wrangler-specific container image.\n# You would typically get the image URI from SageMaker documentation or your AWS account.\n# dw_image_uri = 'your-data-wrangler-processing-image-uri'\n# dw_processor = Processor(\n#     role=role,\n#     image_uri=dw_image_uri,\n#     instance_count=1,\n#     instance_type='ml.m5.xlarge',\n#     max_runtime_in_seconds=3600\n# )\n# \n# dw_processor.run(\n#     inputs=[sagemaker.processing.ProcessingInput(source=input_data_s3_uri, destination='/opt/ml/processing/input')],\n#     outputs=[sagemaker.processing.ProcessingOutput(source='/opt/ml/processing/output', destination=output_data_s3_uri)],\n#     arguments=['--flow', flow_file_s3_uri, '--output-uri', output_data_s3_uri]\n# )\n\n# Option 2: Run a Python script exported from Data Wrangler as a ScriptProcessor\n# This assumes Data Wrangler exported a Python script that encapsulates the transformations.\n# You would need to ensure the script is self-contained or has necessary dependencies.\n\n# Placeholder for a Python script that would be generated by Data Wrangler export.\n# Example content for 'transformation_script.py':\n# import pandas as pd\n# import argparse\n# import os\n# \n# if __name__ == '__main__':\n#     parser = argparse.ArgumentParser()\n#     parser.add_argument('--input-path', type=str, default='/opt/ml/processing/input/input_data.csv')\n#     parser.add_argument('--output-path', type=str, default='/opt/ml/processing/output/transformed_data.csv')\n#     args = parser.parse_args()\n# \n#     df = pd.read_csv(args.input_path)\n#     # Apply your Data Wrangler transformations here, e.g.,\n#     df['new_feature'] = df['existing_feature'] * 2\n#     df.to_csv(args.output_path, index=False)\n\nscript_processor = ScriptProcessor(\n    role=role,\n    image_uri='your-sagemaker-processing-python-image-uri', # e.g., '683313688378.dkr.ecr.us-east-1.amazonaws.com/sagemaker-python-sdk:latest-cpu-py310'\n    command=['python3'],\n    instance_count=1,\n    instance_type='ml.m5.xlarge',\n    max_runtime_in_seconds=3600\n)\n\n# Upload the transformation script\nS3Uploader.upload('path/to/local/transformation_script.py', f's3://{bucket}/data-wrangler-scripts/')\n\nscript_processor.run(\n    code=f's3://{bucket}/data-wrangler-scripts/transformation_script.py',\n    inputs=[\n        sagemaker.processing.ProcessingInput(source=input_data_s3_uri, destination='/opt/ml/processing/input')\n    ],\n    outputs=[\n        sagemaker.processing.ProcessingOutput(source='/opt/ml/processing/output', destination=output_data_s3_uri)\n    ],\n    arguments=['--input-path', '/opt/ml/processing/input/input_data.csv', '--output-path', '/opt/ml/processing/output/transformed_data.csv']\n)\n\nprint(f\"Data Wrangler processing job launched. Output will be in: {output_data_s3_uri}\")","lang":"python","description":"This quickstart demonstrates how to programmatically execute a data preparation flow defined and exported from SageMaker Data Wrangler. Since Data Wrangler is a UI-driven tool, the Python library `sagemaker-datawrangler` itself doesn't offer direct transformation functions. Instead, you typically export your flow (either as a `.flow` file or a Python script) and then use the SageMaker Python SDK to run it as a SageMaker Processing Job. This example outlines how to set up and run such a processing job, requiring an S3 bucket for inputs, outputs, and the exported flow/script, as well as an appropriate IAM role.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-15","installed_version":"0.4.3","pypi_latest":"0.4.3","is_stale":false,"summary":{"python_range":"3.10–3.9","success_rate":50,"avg_install_s":29.6,"avg_import_s":null,"wheel_type":"sdist"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"sagemaker-datawrangler","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"sagemaker-datawrangler","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":29,"import_time_s":null,"mem_mb":null,"disk_size":"625M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"sagemaker-datawrangler","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"sagemaker-datawrangler","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":29.3,"import_time_s":null,"mem_mb":null,"disk_size":"661M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"sagemaker-datawrangler","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"sagemaker-datawrangler","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":28.2,"import_time_s":null,"mem_mb":null,"disk_size":"648M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"sagemaker-datawrangler","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"sagemaker-datawrangler","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":28.8,"import_time_s":null,"mem_mb":null,"disk_size":"646M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"sagemaker-datawrangler","exit_code":1,"wheel_type":null,"failure_reason":"build_error","import_side_effects":null,"install_time_s":null,"import_time_s":null,"mem_mb":null,"disk_size":null},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"sagemaker-datawrangler","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"broken","install_time_s":32.6,"import_time_s":null,"mem_mb":null,"disk_size":"617M"}]}}