AWS Glue Local Development

1.0.0 · active · verified Fri Apr 17

The `awsglue3-local` package is a Python utility for facilitating local development of AWS Glue 3.0 jobs. It aims to simplify the setup of a local PySpark environment that mimics the Glue 3.0 runtime, allowing developers to test Glue scripts outside of the AWS cloud. As of its latest release, it's at version 1.0.0. The release cadence is irregular, typically tied to the need for Glue version compatibility.

Common errors

Warnings

Install

Imports

Quickstart

A basic AWS Glue job script demonstrating the initialization of `GlueContext`, `SparkSession`, and parsing job arguments using `getResolvedOptions`. This code assumes `awsglue3-local` has correctly set up the environment for these imports to resolve. Note that full `DynamicFrame` functionality often requires additional Glue libraries and configurations, which `awsglue3-local` aims to facilitate but might require more complex setup for specific connectors.

import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

# This part mimics how Glue passes arguments
# In local development, you might set these via command line or hardcode them
# For quickstart, we use an empty dict if not provided.
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glucueContext = GlueContext(sc)
spark = glucueContext.spark_session
job = Job(glucueContext)
job.init(args['JOB_NAME'], args)

# Example: Create a simple Spark DataFrame
data = [("Alice", 1), ("Bob", 2)]
df = spark.createDataFrame(data, ["Name", "Id"])
df.show()

# Example: Use Glue DynamicFrame (requires more setup for actual data sources)
# try:
#     from awsglue.dynamicframe import DynamicFrame
#     # This part would typically involve reading from S3, JDBC, etc.
#     # For a truly local test, you might convert a Spark DataFrame to DynamicFrame
#     dynamic_frame = DynamicFrame.fromDF(df, glucueContext, "example_df")
#     dynamic_frame.printSchema()
# except ImportError:
#     print("awsglue.dynamicframe not fully functional in this minimal local setup without full Glue libs.")

print("Glue job finished locally.")

job.commit()

view raw JSON →