Databricks PyPI Extras
`databricks-pypi-extras` is a Python library developed by Databricks, providing a collection of utilities designed to enhance the Databricks user experience and extend existing PyPI libraries for better compatibility with Databricks environments. It currently includes modules like `databricks.connect_extras` to simplify operations such as connecting to Databricks (e.g., via Databricks Connect v2) and interacting with notebook contexts. The current version is 0.1, with releases likely occurring as new utilities are developed and integrated.
Common errors
-
ModuleNotFoundError: No module named 'databricks.connect_extras.context'
cause The `databricks-pypi-extras` library or a specific submodule was not installed, or the import path is incorrect.fixEnsure you have installed the library using `pip install databricks-pypi-extras`. Verify the import path against the official GitHub repository's `src` directory, as the library is modular and may introduce new sub-packages. -
Spark session could not be retrieved. Ensure Databricks Connect is properly configured or run in a Databricks Notebook.
cause Attempting to use `current_spark_context()` or similar utilities from `databricks.connect_extras` outside of a properly configured Databricks Connect environment or a Databricks Notebook.fixInstall `databricks-connect` (`pip install "databricks-connect[databricks-connect-dependencies]"`) and configure it using `databricks-connect configure`, or run your code directly within a Databricks notebook.
Warnings
- gotcha Many utilities in `databricks-pypi-extras` (especially in `connect_extras`) are designed for specific Databricks environments (e.g., Databricks Notebooks or Databricks Connect v2). Running them outside these environments might lead to `None` returns, errors, or unexpected behavior.
- breaking As the library is currently at version 0.1, its API is considered experimental and highly subject to change. Future versions may introduce breaking changes to existing modules, functions, or their signatures without prior deprecation cycles.
Install
-
pip install databricks-pypi-extras
Imports
- current_spark_context
from databricks.connect_extras.context import current_spark_context
- get_current_notebook_path
from databricks.connect_extras.notebook import get_current_notebook_path
Quickstart
import os
from databricks.connect_extras.context import current_spark_context
# This example demonstrates retrieving a Spark session using databricks.connect_extras.
# To run this successfully outside a Databricks Notebook, you must:
# 1. Install Databricks Connect: pip install "databricks-connect[databricks-connect-dependencies]"
# 2. Configure Databricks Connect using `databricks-connect configure`
# or by setting environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN, DATABRICKS_CLUSTER_ID, etc.).
print("Attempting to get Spark session via databricks.connect_extras...")
try:
spark = current_spark_context()
if spark:
print(f"Successfully retrieved SparkSession (Spark version: {spark.version})")
# Example usage: create a simple DataFrame
data = [("Alice", 1), ("Bob", 2), ("Charlie", 3)]
df = spark.createDataFrame(data, ["Name", "Value"])
print("\nExample DataFrame created and shown:")
df.show()
else:
print("Spark session could not be retrieved. Ensure Databricks Connect is properly configured or run in a Databricks Notebook.")
except Exception as e:
print(f"An error occurred: {e}")
print("Please ensure `databricks-connect` is installed and configured, and your environment is set up for Databricks Connect v2.")