DataHub Actions

raw JSON →
1.5.0.14 verified Mon Apr 27 auth: no python

An action framework for reacting to real-time changes in DataHub, including entity change events and metadata change sync. Current version is 1.5.0.14, but the GitHub releases lag behind; the main git version is v0.2.1. The library is actively maintained by Acryl Data.

pip install acryl-datahub-actions
error ModuleNotFoundError: No module named 'datahub_actions'
cause Package not installed or installed with different name.
fix
pip install acryl-datahub-actions
error ImportError: cannot import name 'Action' from 'datahub_actions'
cause Action class not exposed at package root as of v0.2.0+.
fix
from datahub_actions.action.action import Action
error UnboundVariable: The environment variable 'MY_VAR' is not set.
cause Environment variable not defined in action config, and tenacity retry may be missing in v0.1.8.
fix
Set the missing variable or upgrade to >=v0.1.10. For v0.1.8, also install tenacity.
error ModuleNotFoundError: No module named 'tenacity'
cause Missing dependency in v0.1.7/v0.1.8.
fix
pip install tenacity or upgrade to v0.1.10+
deprecated The v0.1.8 release has a known issue: missing tenacity dependency causing UnboundVariable exceptions.
fix Upgrade to v0.1.10 or later, or manually install tenacity: pip install tenacity
breaking The v0.2.0 release restructured imports; Action base class moved from top-level to submodule.
fix Use from datahub_actions.action.action import Action instead of from datahub_actions import Action
gotcha Many action classes require deep nested import paths; not exported from package root.
fix Inspect the package's submodules for correct paths, e.g., datahub_actions.action.metadata_change_sync.
deprecated The old pattern using DatahubRestSink and DatahubRestSource is deprecated in favor of DatahubSink/DatahubSource.
fix Use DatahubSource and DatahubSink (from datahub_actions.source and .sink).
pip install 'acryl-datahub-actions[all]'

Minimal pipeline connecting DataHub source to DataHub sink with no actions.

from datahub_actions.pipeline import Pipeline
from datahub_actions.source.datahub_source import DatahubSource
from datahub_actions.sink.datahub_sink import DatahubSink
import os

# Configure a simple pipeline that logs events
pipeline = Pipeline(
    source=DatahubSource(
        host=os.environ.get('DATAHUB_GMS_HOST', 'http://localhost:8080'),
        token=os.environ.get('DATAHUB_GMS_TOKEN', '')
    ),
    sink=DatahubSink(
        host=os.environ.get('DATAHUB_GMS_HOST', 'http://localhost:8080'),
        token=os.environ.get('DATAHUB_GMS_TOKEN', '')
    ),
    actions=[]
)
pipeline.run()