Snowpark Connect Dependencies Part 2

3.56.4 · active · verified Thu Apr 16

The `snowpark-connect-deps-2` package provides supporting JAR dependencies essential for Snowflake's Snowpark Connect for Spark. Snowpark Connect enables developers to execute Apache Spark workloads directly on Snowflake's high-performance compute engine, leveraging familiar Spark DataFrame APIs without the overhead of managing a dedicated Spark cluster. This package, alongside `snowpark-connect-deps-1`, underpins the functionality of the user-facing `snowpark-connect` library, which is part of the broader Snowpark for Python ecosystem. It is currently at version 3.56.4 and follows a rapid release cadence in conjunction with the `snowpark-connect` library.

Common errors

Warnings

Install

Quickstart

This package is a backend dependency. Users do not directly import `snowpark-connect-deps-2`. The quickstart demonstrates how to use `snowpark-connect` (the user-facing library that depends on `snowpark-connect-deps-2`) to establish a Spark session and perform basic DataFrame operations with Snowflake. Ensure your Snowflake connection parameters are configured, preferably via environment variables, to establish the session.

import os
from snowflake import snowpark_connect
from pyspark.sql import SparkSession

# Set environment variable to enable Spark Connect mode
os.environ["SPARK_CONNECT_MODE_ENABLED"] = "1"

# Configure connection parameters (replace with your Snowflake details)
# It's recommended to use environment variables or a configuration file
# for sensitive information like passwords.
os.environ["SNOWFLAKE_ACCOUNT"] = os.environ.get("SNOWFLAKE_ACCOUNT", "your_account_identifier")
os.environ["SNOWFLAKE_USER"] = os.environ.get("SNOWFLAKE_USER", "your_username")
os.environ["SNOWFLAKE_PASSWORD"] = os.environ.get("SNOWFLAKE_PASSWORD", "your_password")
os.environ["SNOWFLAKE_ROLE"] = os.environ.get("SNOWFLAKE_ROLE", "your_role")
os.environ["SNOWFLAKE_WAREHOUSE"] = os.environ.get("SNOWFLAKE_WAREHOUSE", "your_warehouse")
os.environ["SNOWFLAKE_DATABASE"] = os.environ.get("SNOWFLAKE_DATABASE", "your_database")
os.environ["SNOWFLAKE_SCHEMA"] = os.environ.get("SNOWFLAKE_SCHEMA", "your_schema")

# Start the Spark Connect session
snowpark_connect.start_session()
spark = snowpark_connect.get_session()

# Example: Create a DataFrame and show data
data = [("Alice", 1), ("Bob", 2), ("Charlie", 3)]
df = spark.createDataFrame(data, ["Name", "ID"])
df.show()

# Stop the Spark session when done
spark.stop()

view raw JSON →