DataHub Actions
raw JSON → 1.5.0.14 verified Mon Apr 27 auth: no python
An action framework for reacting to real-time changes in DataHub, including entity change events and metadata change sync. Current version is 1.5.0.14, but the GitHub releases lag behind; the main git version is v0.2.1. The library is actively maintained by Acryl Data.
pip install acryl-datahub-actions Common errors
error ModuleNotFoundError: No module named 'datahub_actions' ↓
cause Package not installed or installed with different name.
fix
pip install acryl-datahub-actions
error ImportError: cannot import name 'Action' from 'datahub_actions' ↓
cause Action class not exposed at package root as of v0.2.0+.
fix
from datahub_actions.action.action import Action
error UnboundVariable: The environment variable 'MY_VAR' is not set. ↓
cause Environment variable not defined in action config, and tenacity retry may be missing in v0.1.8.
fix
Set the missing variable or upgrade to >=v0.1.10. For v0.1.8, also install tenacity.
error ModuleNotFoundError: No module named 'tenacity' ↓
cause Missing dependency in v0.1.7/v0.1.8.
fix
pip install tenacity or upgrade to v0.1.10+
Warnings
deprecated The v0.1.8 release has a known issue: missing tenacity dependency causing UnboundVariable exceptions. ↓
fix Upgrade to v0.1.10 or later, or manually install tenacity: pip install tenacity
breaking The v0.2.0 release restructured imports; Action base class moved from top-level to submodule. ↓
fix Use from datahub_actions.action.action import Action instead of from datahub_actions import Action
gotcha Many action classes require deep nested import paths; not exported from package root. ↓
fix Inspect the package's submodules for correct paths, e.g., datahub_actions.action.metadata_change_sync.
deprecated The old pattern using DatahubRestSink and DatahubRestSource is deprecated in favor of DatahubSink/DatahubSource. ↓
fix Use DatahubSource and DatahubSink (from datahub_actions.source and .sink).
Install
pip install 'acryl-datahub-actions[all]' Imports
- MetadataChangeSyncAction
from datahub_actions.action.metadata_change_sync.metadata_change_sync_action import MetadataChangeSyncAction - Action wrong
from datahub_actions import Actioncorrectfrom datahub_actions.action.action import Action - Pipeline
from datahub_actions.pipeline import Pipeline
Quickstart
from datahub_actions.pipeline import Pipeline
from datahub_actions.source.datahub_source import DatahubSource
from datahub_actions.sink.datahub_sink import DatahubSink
import os
# Configure a simple pipeline that logs events
pipeline = Pipeline(
source=DatahubSource(
host=os.environ.get('DATAHUB_GMS_HOST', 'http://localhost:8080'),
token=os.environ.get('DATAHUB_GMS_TOKEN', '')
),
sink=DatahubSink(
host=os.environ.get('DATAHUB_GMS_HOST', 'http://localhost:8080'),
token=os.environ.get('DATAHUB_GMS_TOKEN', '')
),
actions=[]
)
pipeline.run()