{"id":5451,"library":"quinn","title":"Quinn PySpark Utilities","description":"Quinn is a Python library providing helper methods for PySpark to enhance developer productivity. It offers DataFrame validation functions, useful column functions/DataFrame transformations, and performant helper functions. The library is currently at version 0.10.3 and maintains an active release cadence.","status":"active","version":"0.10.3","language":"en","source_language":"en","source_url":"https://github.com/MrPowers/quinn/","tags":["pyspark","spark","dataframe","etl","data-processing","utilities","data-validation"],"install":[{"cmd":"pip install quinn","lang":"bash","label":"Install with pip"}],"dependencies":[{"reason":"Quinn is a utility library built on top of Apache PySpark and requires it for all functionality.","package":"pyspark"}],"imports":[{"note":"Import the main quinn library for DataFrame transformations and helper functions.","symbol":"quinn","correct":"import quinn"},{"note":"The `extensions` module automatically patches methods like `create_df` onto the SparkSession and `Column` objects when imported with `*`. Using specific imports for individual extensions is less common as the primary intention is to enhance the PySpark environment broadly.","wrong":"from quinn.extensions import some_specific_extension_function","symbol":"extensions","correct":"from quinn.extensions import *"},{"note":"This is a standard PySpark best practice and crucial for working with Column expressions, which quinn often extends or uses.","symbol":"F","correct":"from pyspark.sql import functions as F"}],"quickstart":{"code":"from pyspark.sql import SparkSession\nfrom pyspark.sql.types import StructType, StructField, StringType, IntegerType\nimport quinn\nfrom quinn.extensions import *\n\n# Initialize SparkSession\nspark = SparkSession.builder \\\n    .appName(\"QuinnQuickstart\") \\\n    .master(\"local[*]\") \\\n    .getOrCreate()\n\n# NOTE: `spark.create_df` and Column methods like `isTruthy()` are automatically\n# available after `from quinn.extensions import *`\n\n# Create a DataFrame using quinn's extended create_df method\ndata = [\n    (\"Alice\", 1, \"USA\"),\n    (\"Bob\", 2, \"Canada\"),\n    (\"Charlie\", 3, \"Mexico\")\n]\nschema_def = [\n    (\"firstName\", \"string\", True),\n    (\"id\", \"integer\", True),\n    (\"country\", \"string\", True)\n]\ndf = spark.create_df(data, schema_def)\nprint(\"Original DataFrame Schema:\")\ndf.printSchema()\nprint(\"Original DataFrame Data:\")\ndf.show()\n\n# Apply a quinn DataFrame transformation: snake_case_columns\nsnake_cased_df = quinn.snake_case_columns(df)\nprint(\"\\nDataFrame with snake_cased columns:\")\nsnake_cased_df.printSchema()\nsnake_cased_df.show()\n\n# Demonstrate a Column extension (e.g., isTruthy from quinn.extensions)\nfrom pyspark.sql import functions as F\nextended_df = df.withColumn(\"is_id_truthy\", F.col(\"id\").isTruthy())\nprint(\"\\nDataFrame with 'is_id_truthy' column (using quinn extension):\")\nextended_df.show()\n\n# Stop SparkSession\nspark.stop()","lang":"python","description":"This quickstart demonstrates how to initialize a SparkSession, create a DataFrame using `quinn`'s extended `create_df` method, apply a common DataFrame transformation like `snake_case_columns`, and utilize a Column extension such as `isTruthy`."},"warnings":[{"fix":"Users migrating from versions prior to 0.2.0 will need to update their import statements and potentially function calls to align with the new module structure. Refer to the GitHub releases for details.","message":"Version 0.2.0 introduced significant breaking changes to the directory structure and import interfaces for PySpark extensions and functions.","severity":"breaking","affected_versions":"<0.2.0"},{"fix":"Avoid using `print_athena_create_table`. Check the latest documentation for alternative methods to generate Athena table DDL or construct it manually.","message":"The `print_athena_create_table` functionality has been deprecated.","severity":"deprecated","affected_versions":"0.10.3+"},{"fix":"For the main `quinn` library, prefer `import quinn` and then call functions as `quinn.function_name()`. For extensions, `from quinn.extensions import *` is often intended for the automatic patching, but be aware of the namespace pollution. If only specific extension functions are needed and not the automatic patching, consider importing them directly, though this might miss the intended auto-patching behavior.","message":"Using wildcard imports (`from quinn import *` for the main `quinn` module, or even `from quinn.extensions import *` if only specific functions are needed) can make it difficult to trace where functions originate, potentially leading to name collisions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always remember that your DataFrame transformations won't run until an action is triggered. Use actions strategically for debugging (e.g., `df.show()`) and ensure your job design accounts for this lazy execution model.","message":"PySpark operations (including those using `quinn`) are lazily evaluated. Transformations build a logical plan and are only executed when an action (e.g., `show()`, `collect()`, `write()`) is called. This can be a common pitfall for Python developers used to immediate execution.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-13T00:00:00.000Z","next_check":"2026-07-12T00:00:00.000Z"}