{"id":6314,"library":"awsglue-dev","title":"AWS Glue Development Library (awsglue-dev)","description":"awsglue-dev provides Python interfaces to the AWS Glue ETL library, primarily for local development, IDE auto-completion, and local script validation. It extends Apache Spark with additional data types and operations for ETL workflows. The package version `2021.12.30` is part of an ecosystem that facilitates authoring scripts for AWS Glue, a fully managed, serverless ETL service.","status":"active","version":"2021.12.30","language":"en","source_language":"en","source_url":"https://github.com/awslabs/aws-glue-libs","tags":["AWS Glue","ETL","Spark","PySpark","local development","cloud","data integration"],"install":[{"cmd":"pip install awsglue-dev","lang":"bash","label":"Install core library"},{"cmd":"pip install pyspark","lang":"bash","label":"Install required dependency for local execution"}],"dependencies":[{"reason":"Required for local development and to successfully import and use GlueContext and SparkContext, as awsglue-dev extends PySpark.","package":"pyspark","optional":false}],"imports":[{"symbol":"GlueContext","correct":"from awsglue.context import GlueContext"},{"symbol":"SparkContext","correct":"from pyspark.context import SparkContext"},{"symbol":"Job","correct":"from awsglue.job import Job"},{"symbol":"getResolvedOptions","correct":"from awsglue.utils import getResolvedOptions"},{"symbol":"DynamicFrame","correct":"from awsglue.dynamicframe import DynamicFrame"},{"note":"Commonly, individual transform classes are imported directly or using `*` for convenience in Glue scripts, not the parent module.","wrong":"from awsglue import transforms","symbol":"* (transforms)","correct":"from awsglue.transforms import *"}],"quickstart":{"code":"import sys\nfrom pyspark.context import SparkContext\nfrom awsglue.context import GlueContext\nfrom awsglue.job import Job\nfrom awsglue.utils import getResolvedOptions\n\n# These parameters are typically passed by AWS Glue service\n# For local development, you might set dummy values or omit if not testing getResolvedOptions\nargs = getResolvedOptions(sys.argv, ['JOB_NAME'])\n\nsc = SparkContext()\ngLueContext = GlueContext(sc)\nspark = gLueContext.spark_session\njob = Job(gLueContext)\njob.init(args['JOB_NAME'], args)\n\n# Your Glue ETL script logic would go here\n# For example, to create a DynamicFrame:\n# from awsglue.dynamicframe import DynamicFrame\n# dynamic_frame = gLueContext.create_dynamic_frame.from_options(\n#     connection_type='s3', \n#     connection_options={'paths': ['s3://your-bucket/your-data/'], 'recurse': True}, \n#     format='json'\n# )\n\nprint(f\"Initialized GlueContext and SparkSession for job: {args['JOB_NAME']}\")\n# Don't forget job.commit() in a real Glue job\n# job.commit()\n","lang":"python","description":"This quickstart demonstrates the foundational boilerplate for an AWS Glue ETL script, initializing the SparkContext, GlueContext, and Job objects. While `awsglue-dev` provides the interfaces locally, actual ETL execution often requires a Glue environment (e.g., Docker container or AWS Glue service) to run successfully with real data sources. The `getResolvedOptions` function is used to handle job parameters, which are central to Glue job execution."},"warnings":[{"fix":"Do not expect scripts using `awsglue-dev` to run fully functional locally without a full AWS Glue runtime setup (e.g., official AWS Glue Docker images) or deployment to the AWS Glue service. Use it for development and testing logic, not full local execution.","message":"The `awsglue-dev` package primarily offers Python interfaces for local development (e.g., IDE auto-completion, static analysis). Actual AWS Glue ETL scripts built with these interfaces *must be executed within the AWS Glue service* or a compatible local Docker environment that includes the Glue Spark runtime JARs.","severity":"breaking","affected_versions":"All versions"},{"fix":"Ensure `pip install pyspark` is run in your development environment alongside `pip install awsglue-dev`.","message":"For local development with `awsglue-dev`, the `pyspark` library is a mandatory peer dependency and must be installed separately. Without it, core components like `SparkContext` and `GlueContext` will not function.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Explicitly implement partitioning in your ETL job scripts, especially when writing `DynamicFrame` or `DataFrame` outputs, to leverage performance optimizations like partition pruning. Refer to AWS Glue documentation for `repartition` or `write` options.","message":"AWS Glue job scripts, by default, may not automatically partition output data when writing to target data sources. This can lead to poor performance on large datasets.","severity":"gotcha","affected_versions":"All versions (especially for Glue-generated scripts)"},{"fix":"For production Glue jobs, explicitly manage dependencies using `requirements.txt` and ensure packages are updated to secure versions. Regularly audit your dependencies using security scanning tools.","message":"AWS Glue environments (which `awsglue-dev` mirrors) often ship with a set of pre-installed Python packages, some of which may contain known vulnerabilities or be outdated. Relying solely on these default versions can pose security risks.","severity":"gotcha","affected_versions":"All versions of AWS Glue, reflected in local development environments"},{"fix":"Consult the official AWS Glue migration guides for each specific version upgrade. Test your Glue scripts thoroughly against the target Glue version in a development environment before deploying to production. Pay attention to Spark and Python compatibility.","message":"Migrating AWS Glue jobs between major Glue versions (e.g., from Glue 2.0/3.0 to 4.0/5.0) can introduce breaking changes due to underlying Spark version upgrades, changes in supported Python versions, or deprecation of certain libraries/APIs. Scripts developed with `awsglue-dev` might need adjustments.","severity":"breaking","affected_versions":"When transitioning between AWS Glue major service versions (e.g., 2.0, 3.0, 4.0, 5.0)"}],"env_vars":null,"last_verified":"2026-04-15T00:00:00.000Z","next_check":"2026-07-14T00:00:00.000Z"}