{"id":7733,"library":"snowpark-connect-deps-2","title":"Snowpark Connect Dependencies Part 2","description":"The `snowpark-connect-deps-2` package provides supporting JAR dependencies essential for Snowflake's Snowpark Connect for Spark. Snowpark Connect enables developers to execute Apache Spark workloads directly on Snowflake's high-performance compute engine, leveraging familiar Spark DataFrame APIs without the overhead of managing a dedicated Spark cluster. This package, alongside `snowpark-connect-deps-1`, underpins the functionality of the user-facing `snowpark-connect` library, which is part of the broader Snowpark for Python ecosystem. It is currently at version 3.56.4 and follows a rapid release cadence in conjunction with the `snowpark-connect` library.","status":"active","version":"3.56.4","language":"en","source_language":"en","source_url":"https://github.com/snowflakedb/snowpark-python","tags":["snowflake","snowpark","spark","connect","dependency","data-engineering","etl"],"install":[{"cmd":"pip install snowpark-connect-deps-2","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Required Python interpreter versions as specified by the package metadata.","package":"python","version":">=3.10, <3.13","optional":false}],"imports":[],"quickstart":{"code":"import os\nfrom snowflake import snowpark_connect\nfrom pyspark.sql import SparkSession\n\n# Set environment variable to enable Spark Connect mode\nos.environ[\"SPARK_CONNECT_MODE_ENABLED\"] = \"1\"\n\n# Configure connection parameters (replace with your Snowflake details)\n# It's recommended to use environment variables or a configuration file\n# for sensitive information like passwords.\nos.environ[\"SNOWFLAKE_ACCOUNT\"] = os.environ.get(\"SNOWFLAKE_ACCOUNT\", \"your_account_identifier\")\nos.environ[\"SNOWFLAKE_USER\"] = os.environ.get(\"SNOWFLAKE_USER\", \"your_username\")\nos.environ[\"SNOWFLAKE_PASSWORD\"] = os.environ.get(\"SNOWFLAKE_PASSWORD\", \"your_password\")\nos.environ[\"SNOWFLAKE_ROLE\"] = os.environ.get(\"SNOWFLAKE_ROLE\", \"your_role\")\nos.environ[\"SNOWFLAKE_WAREHOUSE\"] = os.environ.get(\"SNOWFLAKE_WAREHOUSE\", \"your_warehouse\")\nos.environ[\"SNOWFLAKE_DATABASE\"] = os.environ.get(\"SNOWFLAKE_DATABASE\", \"your_database\")\nos.environ[\"SNOWFLAKE_SCHEMA\"] = os.environ.get(\"SNOWFLAKE_SCHEMA\", \"your_schema\")\n\n# Start the Spark Connect session\nsnowpark_connect.start_session()\nspark = snowpark_connect.get_session()\n\n# Example: Create a DataFrame and show data\ndata = [(\"Alice\", 1), (\"Bob\", 2), (\"Charlie\", 3)]\ndf = spark.createDataFrame(data, [\"Name\", \"ID\"])\ndf.show()\n\n# Stop the Spark session when done\nspark.stop()\n","lang":"python","description":"This package is a backend dependency. Users do not directly import `snowpark-connect-deps-2`. The quickstart demonstrates how to use `snowpark-connect` (the user-facing library that depends on `snowpark-connect-deps-2`) to establish a Spark session and perform basic DataFrame operations with Snowflake. Ensure your Snowflake connection parameters are configured, preferably via environment variables, to establish the session."},"warnings":[{"fix":"Focus troubleshooting on the `snowpark-connect` library, Python environment, and JDK setup rather than this specific dependency package.","message":"The `snowpark-connect-deps-2` package is a low-level dependency for `snowpark-connect`. Direct interaction or imports are not expected; issues often stem from environmental setup or `snowpark-connect` itself.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Ensure a compatible JDK is installed and the `JAVA_HOME` environment variable is correctly set to its installation path. Tools like `jdk4py` (an optional dependency of `snowpark-connect`) can assist with programmatic JDK configuration.","message":"Snowpark Connect for Spark often requires a correctly configured Java Development Kit (JDK) in your environment, typically Java 11 or 17. Without it, `snowpark-connect` might fail to initialize or run properly.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Be aware of these implicit conversions when defining schemas or expecting specific integer precision. Explicitly cast types if precise control is needed, or consult the Snowpark Connect for Spark compatibility guide.","message":"Snowpark Connect for Spark implicitly converts certain Spark integral data types (`ByteType`, `ShortType`, `IntegerType`) to `LongType` when operating on data. This can lead to unexpected type changes.","severity":"breaking","affected_versions":"All versions of Snowpark Connect for Spark"},{"fix":"Refactor code to avoid embedding UDFs directly within lambda expressions. Use built-in SQL functions or standalone UDFs where possible for optimal performance and compatibility.","message":"Snowpark Connect for Spark has limitations regarding User-Defined Functions (UDFs) within lambda expressions. UDFs are generally not supported inside lambdas, including some built-in functions implemented as Snowflake UDFs.","severity":"gotcha","affected_versions":"All versions of Snowpark Connect for Spark"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Update `setuptools` (`pip install --upgrade setuptools`). If the issue persists, try recreating your virtual environment and reinstalling `snowflake-snowpark-python` and `snowpark-connect`.","cause":"This error or similar `DeprecationWarning: pkg_resources is deprecated` often indicates an issue with package resolution, potentially due to outdated `setuptools` or an environment conflict.","error":"AttributeError: module 'snowflake.snowpark' has no attribute '_internal'"},{"fix":"Install a supported JDK (e.g., OpenJDK 11 or 17) and ensure the `JAVA_HOME` environment variable is correctly set to the root directory of your JDK installation. For example, `export JAVA_HOME=/path/to/jdk-17`.","cause":"Snowpark Connect for Spark relies on Java, and this error indicates that a Java Virtual Machine (JVM) could not be located or properly initialized in the environment.","error":"java.lang.RuntimeException: [FATAL] No JVM found."},{"fix":"Verify column names and their casing against the actual schema. Snowflake typically stores identifiers in uppercase by default, so ensure consistency or use proper quoting mechanisms if mixed-case identifiers are used. Review the Spark Connect compatibility guide for semantic differences.","cause":"This typically occurs when a column name used in a Spark DataFrame operation does not exist in the DataFrame's schema, often due to case sensitivity differences between Spark and Snowflake, or incorrect transformations.","error":"org.apache.spark.sql.AnalysisException: Cannot resolve '`your_column`' given input columns"}]}