{"library":"sparkaid","title":"SparkAid","type":"library","description":"SparkAid is a Python utility library (version 1.0.0) designed to simplify common data manipulation tasks in Apache Spark, particularly for DataFrames with complex, nested schemas. It provides functions to address challenges like schema flattening and working with structured types. The library has a slow release cadence, with its latest version released in August 2022.","language":"python","status":"maintenance","last_verified":"Thu Apr 16","install":{"commands":["pip install sparkaid"],"cli":null},"imports":["from sparkaid import flatten"],"auth":{"required":false,"env_vars":[]},"links":{"homepage":null,"github":"https://github.com/lvhuyen/SparkAid","docs":null,"changelog":null,"pypi":"https://pypi.org/project/sparkaid/","npm":null,"openapi_spec":null,"status_page":null,"smithery":null},"quickstart":{"code":"from pyspark.sql import SparkSession\nfrom pyspark.sql.types import StructType, StructField, StringType, ArrayType, IntegerType\nfrom sparkaid import flatten\n\n# Initialize Spark Session\nspark = SparkSession.builder \\\n    .appName(\"SparkAidQuickstart\") \\\n    .master(\"local[*]\") \\\n    .getOrCreate()\n\n# Create a sample DataFrame with nested structure\ndata = [\n    (\"Alice\", {\"city\": \"New York\", \"zip\": 10001}, [\"apple\", \"banana\"]),\n    (\"Bob\", {\"city\": \"Los Angeles\", \"zip\": 90001}, [\"orange\"]),\n    (\"Charlie\", None, [\"grape\", \"kiwi\", \"mango\"])\n]\n\nschema = StructType([\n    StructField(\"name\", StringType(), True),\n    StructField(\"address\", StructType([\n        StructField(\"city\", StringType(), True),\n        StructField(\"zip\", IntegerType(), True)\n    ]), True),\n    StructField(\"fruits\", ArrayType(StringType()), True)\n])\n\ndf = spark.createDataFrame(data, schema)\nprint(\"Original Schema:\")\ndf.printSchema()\n\nprint(\"\\nFlattening DataFrame:\")\n# Flatten the DataFrame. By default, it flattens StructTypes.\n# For array flattening, 'arrays_to_unpack=[\"*\"]' is needed as per v1.0.0 breaking change.\nflattened_df = flatten(df, nested_struct_separator=\"__\", arrays_to_unpack=[\"fruits\"])\n\nprint(\"Flattened Schema:\")\nflattened_df.printSchema()\nprint(\"Flattened Data:\")\nflattened_df.show()\n\nspark.stop()","lang":"python","description":"Demonstrates how to initialize a SparkSession, create a DataFrame with nested structures, and use `sparkaid.flatten` to unnest the schema. Note the use of `arrays_to_unpack` for array types.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":null}