{"id":10062,"library":"prophecy-libs","title":"Prophecy Python Libraries","description":"Prophecy Python Libraries (`prophecy-libs`) provides helper functions and utilities for Python code generated by the Prophecy data engineering platform. It facilitates the execution, configuration, and integration of Prophecy-generated data pipelines with Apache Spark. The library is actively maintained with frequent releases, typically accompanying platform updates.","status":"active","version":"2.1.17","language":"en","source_language":"en","source_url":"https://github.com/SimpleDataLabsInc/prophecy-python-libs","tags":["data pipelines","code generation","ETL","spark","pyspark","data engineering"],"install":[{"cmd":"pip install prophecy-libs","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core dependency for Spark-based data pipelines. Requires a specific version range (e.g., >=3.3.0,<4.0.0).","package":"pyspark","optional":false},{"reason":"Used for data manipulation within some pipeline components.","package":"pandas","optional":false},{"reason":"Required for interacting with Delta Lake tables.","package":"delta-spark","optional":false},{"reason":"For managing environment variables, often for local development.","package":"python-dotenv","optional":true},{"reason":"For interacting with Databricks platform resources.","package":"databricks-sdk","optional":true}],"imports":[{"note":"Used to access runtime configurations for Prophecy pipelines.","symbol":"ConfigStore","correct":"from prophecy.config import ConfigStore"},{"note":"The class name for user-defined functions changed from `UserDefinedFunctions` to `UDFs` in more recent versions (e.g., 2.0+).","wrong":"from prophecy.udf import UserDefinedFunctions","symbol":"UDFs","correct":"from prophecy.udf import UDFs"},{"note":"The entry point class for launching Prophecy-generated pipelines.","symbol":"ProphecyApp","correct":"from prophecy.main import ProphecyApp"}],"quickstart":{"code":"import os\nfrom pyspark.sql import SparkSession\nfrom prophecy.udf import UDFs\n\n# This quickstart demonstrates how to initialize a SparkSession\n# and register Prophecy's User-Defined Functions (UDFs).\n# In a real Prophecy pipeline, this setup is usually handled automatically\n# by the generated pipeline entry point.\n\n# Ensure PySpark is installed and available in your environment.\n# E.g., `pip install pyspark==3.3.0` (or appropriate version based on prophecy-libs requirements)\n\ndef run_quickstart():\n    # Attempt to use a temporary directory for Spark warehouse for local testing\n    warehouse_dir = os.path.join(os.getcwd(), \"spark-warehouse\")\n    if not os.path.exists(warehouse_dir):\n        os.makedirs(warehouse_dir)\n\n    spark = SparkSession.builder \\\n        .appName(\"ProphecyLibQuickstart\") \\\n        .config(\"spark.sql.warehouse.dir\", warehouse_dir) \\\n        .master(\"local[*]\") \\\n        .getOrCreate()\n\n    try:\n        print(\"SparkSession initialized.\")\n\n        # Register Prophecy UDFs\n        UDFs.register_all_udfs(spark)\n        print(\"Prophecy UDFs registered successfully.\")\n\n        # Example: Using a simple UDF (assuming 'concat' or similar exists after registration)\n        # Note: Actual UDFs depend on the Prophecy project's definitions.\n        # This is a placeholder to show usage.\n        df = spark.createDataFrame([(\"hello\", \"world\")], [\"col1\", \"col2\"])\n        try:\n            df.createOrReplaceTempView(\"my_table\")\n            result = spark.sql(\"SELECT concat(col1, ' ', col2) as greeting FROM my_table\")\n            print(\"\\nExample UDF usage (if 'concat' is available via UDFs):\")\n            result.show()\n        except Exception as e:\n            print(f\"Could not demonstrate UDF usage (e.g., concat): {e}\")\n\n    except Exception as e:\n        print(f\"An error occurred during quickstart: {e}\")\n    finally:\n        spark.stop()\n        print(\"SparkSession stopped.\")\n\nif __name__ == \"__main__\":\n    run_quickstart()","lang":"python","description":"This quickstart demonstrates the essential setup of a SparkSession and how to register Prophecy's User-Defined Functions (UDFs). It highlights the core interaction pattern, though typical usage is within code generated and orchestrated by the Prophecy platform."},"warnings":[{"fix":"Understand that direct manual use might require more setup (e.g., SparkSession, ConfigStore initialization) than expected in a generated pipeline environment.","message":"Prophecy-libs is primarily a helper library for code generated by the Prophecy data engineering platform. While usable standalone, its full context and intended behavior are realized within a Prophecy-generated project, where configurations and Spark sessions are often managed automatically by the platform.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always install a `pyspark` version that precisely matches the requirements specified in `prophecy-libs`'s PyPI metadata (e1.g., `pip install prophecy-libs 'pyspark>=3.3.0,<4.0.0'`).","message":"Strict dependency on PySpark versions. Prophecy pipelines are built on Spark, and the library has specific PySpark version compatibility requirements (e.g., `pyspark>=3.3.0,<4.0.0` for v2.x.x). Using an incompatible PySpark version will lead to runtime errors.","severity":"breaking","affected_versions":"All versions"},{"fix":"For local testing, consider mocking or carefully initializing `ConfigStore` to simulate the runtime environment. In production, rely on the Prophecy platform's configuration injection mechanisms.","message":"Configuration values (via `ConfigStore`) are typically injected at runtime by the Prophecy platform, especially when deploying to environments like Databricks. Manually setting configurations using `ConfigStore.init()` in local tests might be overwritten or behave differently in deployed pipelines.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"Install `pyspark` explicitly, ensuring the version is compatible: `pip install pyspark==3.3.0` (adjust version as per `prophecy-libs` requirements).","cause":"`pyspark` is a core dependency but is not automatically installed with `prophecy-libs` due to its typical external management in Spark environments.","error":"ModuleNotFoundError: No module named 'pyspark'"},{"fix":"Ensure a `SparkSession` is initialized and passed to the method: `spark = SparkSession.builder.appName(...).getOrCreate(); UDFs.register_all_udfs(spark)`.","cause":"Methods that interact with Spark, such as UDF registration, require an active `SparkSession` object as an argument.","error":"TypeError: register_all_udfs() missing 1 required positional argument: 'spark'"},{"fix":"In generated code, configurations are usually accessed via `ConfigStore.get_config().my_setting`. For local testing, ensure `ConfigStore.init(...)` has been called or mock the configuration object.","cause":"Misunderstanding how configurations are accessed or initialized. `ConfigStore` is typically populated by the Prophecy runtime. Attempting to access non-existent attributes or before initialization will fail.","error":"AttributeError: 'ProphecyConfiguration' object has no attribute 'get_config' (or similar config access issues)"},{"fix":"Ensure your Spark session includes necessary JARs (e.g., `spark-hadoop-cloud` for cloud storage) and has appropriate credentials/permissions to access the data source. For local PySpark, configure `spark-submit` with `--packages`.","cause":"The Spark environment is not correctly configured for the specific data source, or there are missing connectors (JARs) or insufficient permissions.","error":"py4j.protocol.Py4JJavaError: An error occurred while calling o0.parquet (or other data source errors like S3, GCS, ADLS)"}]}