{"library":"pyspark-test","title":"PySpark DataFrame Testing Utility","description":"pyspark-test is a Python library designed to simplify unit testing for PySpark DataFrames. It provides a function, `assert_pyspark_df_equal`, inspired by the pandas testing module, which allows users to compare two Spark DataFrames and identify any differences. The library is currently at version 0.2.0 and has a stable, albeit infrequent, release cadence, focusing on its core DataFrame comparison functionality.","language":"python","status":"active","last_verified":"Sun May 17","install":{"commands":["pip install pyspark-test"],"cli":null},"imports":["from pyspark_test import assert_pyspark_df_equal"],"auth":{"required":false,"env_vars":[]},"quickstart":{"code":"import datetime\nfrom pyspark import SparkContext\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.types import StructType, StructField, DateType, StringType, DoubleType, LongType\nfrom pyspark_test import assert_pyspark_df_equal\n\n# Initialize SparkSession for testing\nsc = SparkContext.getOrCreate()\nspark = SparkSession(sc)\n\n# Create two identical DataFrames\ndf_1 = spark.createDataFrame(\n    data=[\n        [datetime.date(2020, 1, 1), 'apple', 1.123, 10],\n        [None, 'banana', 2.345, 20],\n    ],\n    schema=StructType([\n        StructField('col_a', DateType(), True),\n        StructField('col_b', StringType(), True),\n        StructField('col_c', DoubleType(), True),\n        StructField('col_d', LongType(), True),\n    ]),\n)\ndf_2 = spark.createDataFrame(\n    data=[\n        [datetime.date(2020, 1, 1), 'apple', 1.123, 10],\n        [None, 'banana', 2.345, 20],\n    ],\n    schema=StructType([\n        StructField('col_a', DateType(), True),\n        StructField('col_b', StringType(), True),\n        StructField('col_c', DoubleType(), True),\n        StructField('col_d', LongType(), True),\n    ]),\n)\n\n# Assert that the two DataFrames are equal\nprint(\"Asserting identical DataFrames...\")\nassert_pyspark_df_equal(df_1, df_2, check_dtype=True, check_column_names=True, check_columns_in_order=True, order_by=['col_a', 'col_b'])\nprint(\"Assertion successful: DataFrames are equal.\")\n\n# Example of intentionally different DataFrames to demonstrate failure\ndf_3 = spark.createDataFrame(\n    data=[\n        [datetime.date(2020, 1, 1), 'apple', 1.123, 10],\n        [None, 'orange', 99.999, 20], # Changed data\n    ],\n    schema=StructType([\n        StructField('col_a', DateType(), True),\n        StructField('col_b', StringType(), True),\n        StructField('col_c', DoubleType(), True),\n        StructField('col_d', LongType(), True),\n    ]),\n)\n\nprint(\"\\nAsserting different DataFrames (expected to fail)...\")\ntry:\n    assert_pyspark_df_equal(df_1, df_3, check_dtype=True, check_column_names=True, check_columns_in_order=True, order_by=['col_a', 'col_b'])\nexcept AssertionError as e:\n    print(f\"Caught expected error: {e}\")\n\n# Stop SparkSession\nspark.stop()\n","lang":"python","description":"This quickstart demonstrates how to use `assert_pyspark_df_equal` to compare two PySpark DataFrames. It includes the necessary setup for a local SparkSession and shows both successful and intentionally failing assertions to illustrate its usage and error reporting. The `check_dtype`, `check_column_names`, `check_columns_in_order`, and `order_by` parameters are used for a strict comparison.","tag":null,"tag_description":null,"last_tested":null,"results":[]},"compatibility":{"tag":null,"tag_description":null,"last_tested":"2026-05-17","installed_version":"0.2.0","pypi_latest":"0.2.0","is_stale":false,"summary":{"python_range":"3.10–3.9","success_rate":100,"avg_install_s":31.2,"avg_import_s":0.46,"wheel_type":"sdist"},"results":[{"runtime":"python:3.10-alpine","python_version":"3.10","os_libc":"alpine (musl)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.39,"mem_mb":12.8,"disk_size":"505.1M"},{"runtime":"python:3.10-slim","python_version":"3.10","os_libc":"slim (glibc)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":32.7,"import_time_s":0.29,"mem_mb":12.8,"disk_size":"506M"},{"runtime":"python:3.11-alpine","python_version":"3.11","os_libc":"alpine (musl)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.61,"mem_mb":13.9,"disk_size":"511.1M"},{"runtime":"python:3.11-slim","python_version":"3.11","os_libc":"slim (glibc)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":30,"import_time_s":0.59,"mem_mb":13.9,"disk_size":"512M"},{"runtime":"python:3.12-alpine","python_version":"3.12","os_libc":"alpine (musl)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.51,"mem_mb":13.9,"disk_size":"500.1M"},{"runtime":"python:3.12-slim","python_version":"3.12","os_libc":"slim (glibc)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":32,"import_time_s":0.53,"mem_mb":13.9,"disk_size":"501M"},{"runtime":"python:3.13-alpine","python_version":"3.13","os_libc":"alpine (musl)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.56,"mem_mb":14.3,"disk_size":"499.4M"},{"runtime":"python:3.13-slim","python_version":"3.13","os_libc":"slim (glibc)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":29.9,"import_time_s":0.5,"mem_mb":14.3,"disk_size":"500M"},{"runtime":"python:3.9-alpine","python_version":"3.9","os_libc":"alpine (musl)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":null,"import_time_s":0.34,"mem_mb":12.1,"disk_size":"483.5M"},{"runtime":"python:3.9-slim","python_version":"3.9","os_libc":"slim (glibc)","variant":"pyspark-test","exit_code":0,"wheel_type":"sdist","failure_reason":null,"import_side_effects":"clean","install_time_s":31.5,"import_time_s":0.3,"mem_mb":12.1,"disk_size":"484M"}]}}