{"id":1443,"library":"databricks-connect","title":"Databricks Connect Client","description":"Databricks Connect allows you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. It's a client library that configures standard PySpark APIs to run commands remotely on Databricks clusters, enabling local development and debugging against data on a remote cluster. The current version is 18.1.2, with releases tied closely to Databricks Runtime (DBR) versions, typically aligning with DBR major/LTS releases and subsequent patch updates.","status":"active","version":"18.1.2","language":"en","source_language":"en","source_url":"https://docs.databricks.com/en/dev-tools/databricks-connect/index.html","tags":["databricks","spark","pyspark","etl","data-engineering","remote-execution","development-tools"],"install":[{"cmd":"pip install databricks-connect==18.1.2","lang":"bash","label":"Install specific version"}],"dependencies":[],"imports":[{"note":"Databricks Connect configures the standard PySpark SparkSession, you don't import SparkSession directly from databricks.connect.","wrong":"from databricks.connect import SparkSession","symbol":"SparkSession","correct":"from pyspark.sql import SparkSession"}],"quickstart":{"code":"import os\nfrom pyspark.sql import SparkSession\n\n# Configure environment variables (replace with your actual values)\n# For DBR 13.x and later, DATABRICKS_CLUSTER_ID is required.\n# For DBR 12.x and earlier, DATABRICKS_ORG_ID might be required.\n# DATABRICKS_PORT is optional, defaults to 15001.\n\nos.environ['DATABRICKS_HOST'] = os.environ.get('DATABRICKS_HOST', 'https://your-databricks-instance.cloud.databricks.com')\nos.environ['DATABRICKS_TOKEN'] = os.environ.get('DATABRICKS_TOKEN', 'dapi...')\nos.environ['DATABRICKS_CLUSTER_ID'] = os.environ.get('DATABRICKS_CLUSTER_ID', 'your-cluster-id')\n# os.environ['DATABRICKS_ORG_ID'] = os.environ.get('DATABRICKS_ORG_ID', 'your-org-id') # Often not needed for modern DBR/configurations\n# os.environ['DATABRICKS_PORT'] = os.environ.get('DATABRICKS_PORT', '15001') # Default is 15001\n\n# Initialize SparkSession using Databricks Connect\n# The .builder.getOrCreate() method automatically picks up DATABRICKS_ environment variables.\nspark = SparkSession.builder.getOrCreate()\n\n# Example: Run a simple Spark command\ndf = spark.range(10).toDF(\"id\")\ndf.display()\n# df.show() # For local console output\n\nprint(\"Successfully connected to Databricks cluster and ran a Spark command.\")\n\n# Clean up environment variables if running multiple tests or configurations\n# del os.environ['DATABRICKS_HOST']\n# del os.environ['DATABRICKS_TOKEN']\n# del os.environ['DATABRICKS_CLUSTER_ID']\n# (and others you set)\n","lang":"python","description":"This quickstart demonstrates how to initialize a SparkSession with Databricks Connect using environment variables for configuration. Ensure your `DATABRICKS_HOST`, `DATABRICKS_TOKEN`, and `DATABRICKS_CLUSTER_ID` are set correctly. The `display()` command requires a Databricks environment; use `show()` for local console output."},"warnings":[{"fix":"Ensure your local Python environment's major and minor version exactly matches the requirement for your Databricks Connect client version (e.g., Python 3.12 for databricks-connect==18.x).","message":"Databricks Connect client versions are strictly tied to specific Python versions. For instance, Databricks Connect 18.x requires Python 3.12. Using an incompatible Python version will result in installation failures or runtime errors.","severity":"breaking","affected_versions":"All versions, specifically 18.x and later for Python 3.12"},{"fix":"Check your cluster's DBR version and install the corresponding `databricks-connect` client version (e.g., for DBR 13.3 LTS, install `databricks-connect==13.3.*`).","message":"The Databricks Connect client version must exactly match the Databricks Runtime (DBR) version of the cluster you are connecting to. Mismatched versions will lead to connection errors or unexpected behavior.","severity":"breaking","affected_versions":"All versions"},{"fix":"It is highly recommended to install `databricks-connect` in a clean Python virtual environment. Do not install `pyspark` separately when using `databricks-connect`.","message":"Installing `databricks-connect` with an existing `pyspark` installation can lead to dependency conflicts or unexpected `pyspark` version mismatches. `databricks-connect` bundles its own compatible `pyspark`.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Double-check that `DATABRICKS_HOST`, `DATABRICKS_TOKEN`, and `DATABRICKS_CLUSTER_ID` environment variables (or values passed directly to SparkSession builder) are correct and correspond to your target Databricks cluster and workspace. Ensure your token has sufficient permissions.","message":"Incorrect configuration of connection parameters (host, token, cluster ID, sometimes org ID or port) is a very common reason for connection failures.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}