Quinn PySpark Utilities

0.10.3 · active · verified Mon Apr 13

Quinn is a Python library providing helper methods for PySpark to enhance developer productivity. It offers DataFrame validation functions, useful column functions/DataFrame transformations, and performant helper functions. The library is currently at version 0.10.3 and maintains an active release cadence.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to initialize a SparkSession, create a DataFrame using `quinn`'s extended `create_df` method, apply a common DataFrame transformation like `snake_case_columns`, and utilize a Column extension such as `isTruthy`.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
import quinn
from quinn.extensions import *

# Initialize SparkSession
spark = SparkSession.builder \
    .appName("QuinnQuickstart") \
    .master("local[*]") \
    .getOrCreate()

# NOTE: `spark.create_df` and Column methods like `isTruthy()` are automatically
# available after `from quinn.extensions import *`

# Create a DataFrame using quinn's extended create_df method
data = [
    ("Alice", 1, "USA"),
    ("Bob", 2, "Canada"),
    ("Charlie", 3, "Mexico")
]
schema_def = [
    ("firstName", "string", True),
    ("id", "integer", True),
    ("country", "string", True)
]
df = spark.create_df(data, schema_def)
print("Original DataFrame Schema:")
df.printSchema()
print("Original DataFrame Data:")
df.show()

# Apply a quinn DataFrame transformation: snake_case_columns
snake_cased_df = quinn.snake_case_columns(df)
print("\nDataFrame with snake_cased columns:")
snake_cased_df.printSchema()
snake_cased_df.show()

# Demonstrate a Column extension (e.g., isTruthy from quinn.extensions)
from pyspark.sql import functions as F
extended_df = df.withColumn("is_id_truthy", F.col("id").isTruthy())
print("\nDataFrame with 'is_id_truthy' column (using quinn extension):")
extended_df.show()

# Stop SparkSession
spark.stop()

view raw JSON →