Databricks Feature Engineering

0.14.0 · active · verified Sat Apr 11

The `databricks-feature-engineering` library provides a Python client for interacting with Databricks Feature Engineering. It allows users to programmatically create, manage, and utilize feature tables within Databricks, streamlining the development and deployment of machine learning features. It integrates with Databricks Workflows and MLflow. The current version is 0.14.0, with frequent minor releases introducing new features, bug fixes, and occasional breaking changes due to its pre-1.0 status.

Warnings

Install

Imports

Quickstart

Initializes the `FeatureEngineeringClient`. This client automatically handles authentication if run within a Databricks environment or if `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables are set. While the client can initialize without `pyspark`, most feature table operations (like `create_feature_table` or `write_table`) require an active SparkSession.

import os
from databricks.feature_engineering import FeatureEngineeringClient

# For local execution outside a Databricks notebook, ensure these
# environment variables are set for authentication.
# os.environ['DATABRICKS_HOST'] = os.environ.get('DATABRICKS_HOST', 'https://<your-databricks-instance>.cloud.databricks.com')
# os.environ['DATABRICKS_TOKEN'] = os.environ.get('DATABRICKS_TOKEN', 'dapi...')

try:
    # The client automatically picks up credentials from the Databricks environment
    # or DATABRICKS_HOST/DATABRICKS_TOKEN environment variables.
    fe_client = FeatureEngineeringClient()
    print(f"Successfully initialized Databricks Feature Engineering Client: {type(fe_client)}")

    print("\nNote: Most Feature Engineering operations (like creating, writing, or reading tables)")
    print("require an active SparkSession, which is typically available in a Databricks notebook")
    print("or when using Databricks Connect (ensure it's configured locally).")
    print("\nFor example, to create a feature table or write data, you would need a 'spark' object.")

except Exception as e:
    print(f"Failed to initialize Databricks Feature Engineering Client: {e}")
    print("Please ensure you are running in a Databricks environment or have 'DATABRICKS_HOST' and 'DATABRICKS_TOKEN' environment variables set.")

view raw JSON →