{"id":14453,"library":"awsglue3-local","title":"AWS Glue Local Development","description":"The `awsglue3-local` package is a Python utility for facilitating local development of AWS Glue 3.0 jobs. It aims to simplify the setup of a local PySpark environment that mimics the Glue 3.0 runtime, allowing developers to test Glue scripts outside of the AWS cloud. As of its latest release, it's at version 1.0.0. The release cadence is irregular, typically tied to the need for Glue version compatibility.","status":"active","version":"1.0.0","language":"en","source_language":"en","source_url":"https://github.com/aws/aws-glue-libs","tags":["aws","glue","pyspark","local-development","etl"],"install":[{"cmd":"pip install awsglue3-local","lang":"bash","label":"Install the package"}],"dependencies":[],"imports":[{"note":"Standard import for GlueContext in a Glue environment.","wrong":"from glue.context import GlueContext","symbol":"GlueContext","correct":"from awsglue.context import GlueContext"},{"note":"Standard PySpark SparkSession import. awsglue3-local configures the environment.","symbol":"SparkSession","correct":"from pyspark.sql import SparkSession"},{"note":"Used to parse job arguments from the Glue environment.","wrong":"from glue.utils import getResolvedOptions","symbol":"getResolvedOptions","correct":"from awsglue.utils import getResolvedOptions"},{"note":"Core Glue data structure for ETL operations.","symbol":"DynamicFrame","correct":"from awsglue.dynamicframe import DynamicFrame"}],"quickstart":{"code":"import sys\nfrom awsglue.utils import getResolvedOptions\nfrom pyspark.context import SparkContext\nfrom awsglue.context import GlueContext\nfrom awsglue.job import Job\n\n# This part mimics how Glue passes arguments\n# In local development, you might set these via command line or hardcode them\n# For quickstart, we use an empty dict if not provided.\nargs = getResolvedOptions(sys.argv, ['JOB_NAME'])\n\nsc = SparkContext()\nglucueContext = GlueContext(sc)\nspark = glucueContext.spark_session\njob = Job(glucueContext)\njob.init(args['JOB_NAME'], args)\n\n# Example: Create a simple Spark DataFrame\ndata = [(\"Alice\", 1), (\"Bob\", 2)]\ndf = spark.createDataFrame(data, [\"Name\", \"Id\"])\ndf.show()\n\n# Example: Use Glue DynamicFrame (requires more setup for actual data sources)\n# try:\n#     from awsglue.dynamicframe import DynamicFrame\n#     # This part would typically involve reading from S3, JDBC, etc.\n#     # For a truly local test, you might convert a Spark DataFrame to DynamicFrame\n#     dynamic_frame = DynamicFrame.fromDF(df, glucueContext, \"example_df\")\n#     dynamic_frame.printSchema()\n# except ImportError:\n#     print(\"awsglue.dynamicframe not fully functional in this minimal local setup without full Glue libs.\")\n\nprint(\"Glue job finished locally.\")\n\njob.commit()","lang":"python","description":"A basic AWS Glue job script demonstrating the initialization of `GlueContext`, `SparkSession`, and parsing job arguments using `getResolvedOptions`. This code assumes `awsglue3-local` has correctly set up the environment for these imports to resolve. Note that full `DynamicFrame` functionality often requires additional Glue libraries and configurations, which `awsglue3-local` aims to facilitate but might require more complex setup for specific connectors."},"warnings":[{"fix":"Be aware that local execution with `awsglue3-local` provides an approximation of the Glue runtime. For full fidelity, consider using official AWS Glue Docker images or `spark-submit` with `aws-glue-libs` on the classpath.","message":"The `awsglue` module is typically part of the AWS Glue runtime and is not fully pip-installable as a complete, standalone library providing all native Glue functionalities. `awsglue3-local` aims to provide the necessary environment and stub modules to allow standard Glue job scripts to run locally, but some features (e.g., direct S3/JDBC connectors without specific Hadoop/Spark configurations) might still require additional setup or behave differently.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always perform final validation and testing on the actual AWS Glue environment. Ensure local testing focuses on logic and syntax, rather than subtle performance or integration nuances.","message":"Local Glue development environments, including those set up with `awsglue3-local`, can exhibit behavioral differences compared to the actual AWS Glue cloud environment. These discrepancies can stem from differences in Spark configuration, underlying libraries, resource management, or specific Glue service integrations not fully replicated locally.","severity":"breaking","affected_versions":"All versions"},{"fix":"For local testing, ensure you either pass dummy arguments (e.g., `sys.argv.extend(['--JOB_NAME', 'my_local_job'])`) or handle missing arguments gracefully in your script, e.g., by providing default values or checking for argument existence.","message":"When using `getResolvedOptions`, if job arguments are not provided (e.g., when running a script directly without emulating `spark-submit --conf 'spark.driver.args=\"--JOB_NAME myjob\"'`), it will raise an error indicating required arguments are missing. This is a common pitfall in local development.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Ensure `awsglue3-local` is correctly installed. If running with PySpark, confirm that the Glue libraries (often `aws-glue-libs.jar`) are included in your Spark classpath. For `awsglue3-local`, this should ideally be handled, but manual intervention might be needed for complex setups.","cause":"The Python environment does not have the `awsglue` module accessible on its `sys.path` or `awsglue3-local` failed to properly configure the environment.","error":"ModuleNotFoundError: No module named 'awsglue.context'"},{"fix":"Ensure your local Spark environment (or the environment configured by `awsglue3-local`) includes the correct Hadoop-AWS JARs. For `pyspark` directly, this often involves `spark-submit --packages org.apache.hadoop:hadoop-aws:x.y.z ...`.","cause":"This typically indicates that the necessary Hadoop AWS S3 connector JARs are missing from your Spark classpath, which are required for interacting with S3 buckets from Spark/Glue.","error":"java.lang.ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem"},{"fix":"Verify that `GlueContext` is correctly initialized (`GlueContext(sc)`), and `job.init()` is called. Ensure your local environment is robust enough to handle the Glue-specific Spark extensions. Sometimes, restarting the Spark session helps.","cause":"This error often occurs when `DynamicFrame` operations are attempted without a fully initialized Glue context or when there are underlying Spark/JVM issues with the Glue extensions. It can also happen if the `awsglue` libraries are not properly linked.","error":"Py4JJavaError: An error occurred while calling o72.getDynamicFrame.fromDF."}],"ecosystem":"pypi"}