{"id":7606,"library":"pyspark-stubs","title":"PySpark Stubs","description":"PySpark Stubs (pyspark-stubs) provides automatically generated type stubs for the Apache PySpark library. These stubs enable IDEs and static type checkers like MyPy to provide intelligent code completion, detect common programming errors, and improve code quality by enforcing type safety in PySpark applications. The current version is 3.0.0.post3, typically updated to align with major PySpark releases, often with `post` versions for stub refinements.","status":"active","version":"3.0.0.post3","language":"en","source_language":"en","source_url":"https://github.com/zero323/pyspark-stubs","tags":["pyspark","type hints","stubs","typing","mypy","static analysis"],"install":[{"cmd":"pip install pyspark-stubs","lang":"bash","label":"Install latest version"},{"cmd":"pip install 'pyspark-stubs==3.0.*'","lang":"bash","label":"Install specific major version"}],"dependencies":[{"reason":"These are type stubs *for* PySpark; PySpark itself must be installed separately to run code, though it's not a direct install dependency of this stub package.","package":"pyspark","optional":false}],"imports":[{"note":"You import directly from `pyspark`, not `pyspark-stubs`. `pyspark-stubs` provides type information to your type checker (e.g., MyPy, VS Code Pylance) for these PySpark imports.","symbol":"SparkSession","correct":"from pyspark.sql import SparkSession"}],"quickstart":{"code":"import os\nfrom pyspark.sql import SparkSession\nfrom typing import List\n\n# Instantiate SparkSession (requires PySpark to be installed and configured)\nspark: SparkSession = (SparkSession.builder\n    .appName(\"PySparkStubsExample\")\n    .getOrCreate()\n)\n\n# Example of using PySpark with type hints\ndef process_data(data: List[int]) -> List[int]:\n    # In a real scenario, this would involve Spark RDDs/DataFrames\n    # This is a simplified example to show type hints in action.\n    # For a type checker, 'pyspark-stubs' helps validate Spark-specific types.\n    print(f\"Processing data: {data}\")\n    return [x * 2 for x in data]\n\nif __name__ == '__main__':\n    sample_data: List[int] = [1, 2, 3]\n    processed_result = process_data(sample_data)\n    print(f\"Processed result: {processed_result}\")\n\n    # Example with a Spark DataFrame (for demonstration of type support)\n    # This part requires a running SparkSession and actual PySpark code.\n    # For type checking, `pyspark-stubs` ensures `spark` is typed correctly.\n    data_df = spark.createDataFrame([(\"Alice\", 1), (\"Bob\", 2)], [\"name\", \"age\"])\n    data_df.printSchema()\n    data_df.show()\n\n    spark.stop()","lang":"python","description":"This quickstart demonstrates how to use PySpark with type hints, leveraging `pyspark-stubs`. You install `pyspark-stubs` alongside your `pyspark` installation. When a type checker like MyPy processes this code, it uses the installed stubs to validate types for PySpark objects like `SparkSession` and DataFrame methods. The stubs themselves have no runtime effect; they only assist static analysis."},"warnings":[{"fix":"Ensure `pip install pyspark-stubs==X.Y.*` where `X.Y` matches your `pyspark` major.minor version. E.g., for PySpark 3.0.x, use `pyspark-stubs==3.0.*`.","message":"The version of `pyspark-stubs` should generally match the major version of your `pyspark` installation. Mismatched versions can lead to incorrect type checking results, including missing attributes or incompatible type signatures.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Install `pyspark` (e.g., `pip install pyspark`) in addition to `pyspark-stubs` if you intend to run PySpark code.","message":"`pyspark-stubs` only provides type hint files (`.pyi`); it does not include or install the actual `pyspark` library. Your code will not run if `pyspark` is not installed separately.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Understand that `pyspark-stubs` is a development-time dependency for type checking, not a runtime dependency. If you encounter runtime errors, they are related to your PySpark setup, not the stubs.","message":"Installing `pyspark-stubs` has no runtime effect on your PySpark application. Its sole purpose is to provide static type information for tools like MyPy, Pylance, or other IDEs.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Run `pip install pyspark-stubs`. Ensure your `mypy` configuration (e.g., `mypy.ini`) points to the correct Python environment or that `pyspark-stubs` is installed globally or in your project's virtual environment.","cause":"The `pyspark-stubs` package is either not installed, or `mypy` cannot find it in your environment's `site-packages`.","error":"mypy: error: Cannot find module named 'pyspark'"},{"fix":"Verify that your `pyspark-stubs` version matches your `pyspark` version (e.g., `pyspark-stubs==3.0.*` for `pyspark==3.0.*`). Also, confirm that `pyspark-stubs` is correctly installed in the environment `mypy` is checking.","cause":"This typically indicates a version mismatch between `pyspark-stubs` and your installed `pyspark` version, or that the stubs for `SparkSession` are not being correctly picked up by `mypy`.","error":"mypy: error: Module 'pyspark.sql' has no attribute 'SparkSession'"},{"fix":"Ensure you are using a recent version of `pyspark-stubs` (3.0.0.post1+). If the issue persists, try installing `pyspark-stubs` without its (potential) `pyspark` dependency by using `--no-deps` if `pyspark` is already installed, then check versions. Usually, this is not needed for modern `pyspark-stubs` versions.","cause":"Older versions of `pyspark-stubs` might have listed `pyspark` as an optional dependency with specific version requirements, which could conflict if you already have `pyspark` installed with different constraints. The current approach usually avoids this.","error":"pip install pyspark-stubs fails with a dependency error related to `pyspark` (e.g., 'Requires-Dist: pyspark')"}]}