{"id":5466,"library":"sagemaker-feature-store-pyspark-3-1","title":"Amazon SageMaker FeatureStore PySpark Bindings","description":"Amazon SageMaker FeatureStore PySpark Bindings provide a Spark Datasource to read data from SageMaker Feature Store and a Spark Processor to write data to SageMaker Feature Store. This package is specifically designed for Spark 3.1 environments, enabling users to interact with SageMaker Feature Store directly from Spark jobs. The current version is 1.1.3, with updates released as needed for new features or compatibility.","status":"active","version":"1.1.3","language":"en","source_language":"en","source_url":"https://github.com/aws/sagemaker-feature-store-pyspark-sdk","tags":["aws","sagemaker","feature-store","pyspark","machine-learning"],"install":[{"cmd":"pip install sagemaker-feature-store-pyspark-3-1","lang":"bash","label":"Install library"}],"dependencies":[{"reason":"Required for all Spark functionalities; the library targets PySpark 3.1.","package":"pyspark","optional":false}],"imports":[{"symbol":"FeatureStoreManager","correct":"from sagemaker_featurestore_pyspark import FeatureStoreManager"}],"quickstart":{"code":"import os\nfrom pyspark.sql import SparkSession\nfrom sagemaker_featurestore_pyspark import FeatureStoreManager\n\n# IMPORTANT: For local PySpark execution, you MUST include the spark.jars.packages config.\n# Replace '1.1.3' with the exact version of the sagemaker-featurestore-pyspark-sdk you are using.\n# Ensure PySpark and Java are installed and configured for your environment.\nspark = SparkSession.builder \\\n    .appName(\"FeatureStorePySparkQuickstart\") \\\n    .config(\"spark.jars.packages\", \"software.amazon.sagemaker:sagemaker-featurestore-pyspark-sdk:1.1.3\") \\\n    .getOrCreate()\n\n# Replace with your actual Feature Group name and AWS region\nfeature_group_name = os.environ.get('SAGEMAKER_FEATURE_GROUP_NAME', 'your-feature-group-name')\naws_region = os.environ.get('AWS_REGION', 'us-east-1')\n\n# Initialize FeatureStoreManager\n# AWS credentials are typically sourced from the Spark environment (IAM Role, AWS_ACCESS_KEY_ID/SECRET_ACCESS_KEY).\nfs_manager = FeatureStoreManager(spark_session=spark, region=aws_region)\n\ntry:\n    # Read data from the Feature Group's online store\n    df = fs_manager.read_feature_group(\n        feature_group_name=feature_group_name\n    )\n    print(f\"Successfully read data from Feature Group: {feature_group_name}\")\n    df.show(5)\n    df.printSchema()\n\nexcept Exception as e:\n    print(f\"Error interacting with Feature Group {feature_group_name}: {e}\")\n    print(\"Troubleshooting: Ensure Feature Group exists, credentials are set, and Spark environment is configured (esp. 'spark.jars.packages').\")\nfinally:\n    spark.stop()","lang":"python","description":"Demonstrates how to initialize the FeatureStoreManager and read data from a SageMaker Feature Group using PySpark. For local execution, ensure your SparkSession is configured with the correct JAR package."},"warnings":[{"fix":"Ensure your Spark environment (e.g., EMR, Glue, local PySpark setup) is running Spark 3.1.x.","message":"This library is specifically compiled against Spark 3.1 and AWS SDK for Java 2.17. Using it with significantly different Spark or Java SDK versions may lead to runtime errors or unexpected behavior.","severity":"breaking","affected_versions":"All versions of sagemaker-feature-store-pyspark-3-1"},{"fix":"Verify that the IAM role or user associated with your Spark job has the required SageMaker Feature Store permissions.","message":"Incorrect IAM permissions are a common cause of errors when interacting with AWS SageMaker Feature Store. Your Spark environment's IAM role or configured credentials must have `sagemaker:GetRecord`, `sagemaker:PutRecord`, and other necessary permissions for the Feature Groups.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Include `.config('spark.jars.packages', 'software.amazon.sagemaker:sagemaker-featurestore-pyspark-sdk:<VERSION>')` in your `SparkSession.builder` for local/custom setups.","message":"When running PySpark locally or on custom clusters, the necessary Java JARs for the SageMaker Feature Store SDK must be explicitly configured in the SparkSession via `spark.jars.packages` to avoid `ClassNotFoundException` errors.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}