Snowflake Snowpark Python
Snowflake Snowpark for Python provides an intuitive API for querying and processing data in Snowflake using Python. It enables data engineers and data scientists to build scalable data pipelines and machine learning workflows directly within Snowflake, leveraging its elastic, scalable, and secure engine. The library is actively maintained with frequent releases, typically every few weeks, bringing new features, improvements, and bug fixes.
Warnings
- breaking Snowpark Python has dropped support for Python 3.8. Version 1.24.0 was the last to support it. Using Snowpark Python with Python 3.8 will trigger deprecation warnings.
- gotcha The default 'overwrite' mode for `DataFrameWriter.save_as_table` drops and recreates the target table, leading to potential data loss for non-matching rows and impacting grants. This can be unexpected if partial updates are desired.
- gotcha When registering UDFs/SPROCs, specifying an empty list (`[]`) for the `imports` or `packages` argument now explicitly means *no* imports/packages for that specific UDF/SPROC. This behavior changed from older versions where an empty list implicitly meant using session-level imports/packages.
- bug A bug existed where `Session.udf.register_from_file` did not properly process the `strict` and `secure` parameters, potentially leading to UDFs not being created with the intended security or null-handling characteristics.
- gotcha Managing Python packages not available in Snowflake's Anaconda channel for UDFs and stored procedures can be complex. These often require manual zipping and uploading to Snowflake stages, and careful management of `imports` and `packages` parameters.
Install
-
pip install snowflake-snowpark-python
Imports
- Session
from snowflake.snowpark import Session
- functions
from snowflake.snowpark import functions
- types
from snowflake.snowpark import types
- DataFrame
df = session.create_dataframe(...) # DataFrame objects are typically returned by Session methods.
Quickstart
import os
from snowflake.snowpark import Session
from snowflake.snowpark.functions import col
# Establish a Snowpark Session using environment variables
# Replace with your actual connection parameters, or configure ~/.snowflake/connections.toml
connection_parameters = {
"account": os.environ.get("SNOWFLAKE_ACCOUNT", "your_account_identifier"),
"user": os.environ.get("SNOWFLAKE_USER", "your_username"),
"password": os.environ.get("SNOWFLAKE_PASSWORD", "your_password"),
"role": os.environ.get("SNOWFLAKE_ROLE", "your_role"),
"warehouse": os.environ.get("SNOWFLAKE_WAREHOUSE", "your_warehouse"),
"database": os.environ.get("SNOWFLAKE_DATABASE", "your_database"),
"schema": os.environ.get("SNOWFLAKE_SCHEMA", "your_schema"),
}
session = Session.builder.configs(connection_parameters).create()
print("Snowpark Session created successfully.")
# Create a simple DataFrame
data = [("Alice", 1), ("Bob", 2), ("Charlie", 3)]
df = session.create_dataframe(data, schema=["name", "id"])
# Perform a simple transformation and show results
df.filter(col("id") > 1).show()
# Close the session
session.close()
print("Snowpark Session closed.")