Databricks Feature Engineering
The `databricks-feature-engineering` library provides a Python client for interacting with Databricks Feature Engineering. It allows users to programmatically create, manage, and utilize feature tables within Databricks, streamlining the development and deployment of machine learning features. It integrates with Databricks Workflows and MLflow. The current version is 0.14.0, with frequent minor releases introducing new features, bug fixes, and occasional breaking changes due to its pre-1.0 status.
Warnings
- breaking The `online_store_client` parameter in `fe_client.write_table()` was renamed to `databricks_online_table_client`.
- breaking The `DatabricksDbfsClient` class was removed from the library.
- breaking The `online_store_client` argument was removed from `fe_client.create_online_table()` and `fe_client.drop_online_table()` methods.
- gotcha While the `FeatureEngineeringClient` can be instantiated independently, most core functionalities like creating, writing, or reading feature tables fundamentally rely on a SparkSession. Attempting these operations without one will result in errors.
Install
-
pip install databricks-feature-engineering
Imports
- FeatureEngineeringClient
from databricks.feature_engineering import FeatureEngineeringClient
Quickstart
import os
from databricks.feature_engineering import FeatureEngineeringClient
# For local execution outside a Databricks notebook, ensure these
# environment variables are set for authentication.
# os.environ['DATABRICKS_HOST'] = os.environ.get('DATABRICKS_HOST', 'https://<your-databricks-instance>.cloud.databricks.com')
# os.environ['DATABRICKS_TOKEN'] = os.environ.get('DATABRICKS_TOKEN', 'dapi...')
try:
# The client automatically picks up credentials from the Databricks environment
# or DATABRICKS_HOST/DATABRICKS_TOKEN environment variables.
fe_client = FeatureEngineeringClient()
print(f"Successfully initialized Databricks Feature Engineering Client: {type(fe_client)}")
print("\nNote: Most Feature Engineering operations (like creating, writing, or reading tables)")
print("require an active SparkSession, which is typically available in a Databricks notebook")
print("or when using Databricks Connect (ensure it's configured locally).")
print("\nFor example, to create a feature table or write data, you would need a 'spark' object.")
except Exception as e:
print(f"Failed to initialize Databricks Feature Engineering Client: {e}")
print("Please ensure you are running in a Databricks environment or have 'DATABRICKS_HOST' and 'DATABRICKS_TOKEN' environment variables set.")