pytest-spark

0.8.0 · active · verified Wed Apr 15

pytest-spark is a pytest plugin that simplifies testing PySpark applications by automatically providing session-scoped `spark_context` and `spark_session` fixtures. It enables users to configure the Spark environment, including setting SPARK_HOME and custom `spark_options`, directly within `pytest.ini`. The current version is 0.8.0, with an active development and release cycle.

Warnings

Install

Imports

Quickstart

To get started, define a `spark_session` fixture in a `conftest.py` file. This fixture will automatically provide a SparkSession to your tests. Then, write tests that accept `spark_session` as an argument. You can run tests using `pytest` from your terminal.

import pytest
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, IntegerType

# conftest.py (placed in your project's root or tests directory)
@pytest.fixture(scope="session")
def spark_session():
    """
    Fixture for creating a SparkSession for testing.
    This SparkSession is reused across all tests in the session.
    """
    spark = SparkSession.builder \
        .master("local[*]") \
        .appName("pytest-spark-session") \
        .config("spark.driver.memory", "2g") \
        .getOrCreate()
    yield spark
    spark.stop()

# test_example.py (a sample test file)
def test_data_frame_creation(spark_session):
    schema = StructType([
        StructField("name", StringType(), True),
        StructField("age", IntegerType(), True)
    ])
    data = [("Alice", 1), ("Bob", 2)]
    df = spark_session.createDataFrame(data, schema)
    
    assert df.count() == 2
    assert df.columns == ["name", "age"]
    assert df.collect()[0].name == "Alice"

view raw JSON →