Acryl DataHub Python Client
raw JSON → 0.999.1 verified Fri May 01 auth: no python deprecated
DataHub is an open-source metadata platform for the modern data stack. The Python client (acryl-datahub) provides utilities for ingesting metadata, interacting with the DataHub API, and managing data lineage. As of version 0.12.1, the package has transitioned from 'datahub' to 'acryl-datahub' on PyPI. The dummy package 'datahub' 0.999.1 is a placeholder and should be avoided in favor of 'acryl-datahub'.
pip install acryl-datahub Common errors
error ModuleNotFoundError: No module named 'datahub' ↓
cause The 'datahub' dummy package is not installed or is the wrong package.
fix
pip install acryl-datahub
error ImportError: cannot import name 'DataHubRestEmitter' from 'datahub' ↓
cause The import path changed or the wrong package is installed.
fix
Install acryl-datahub and use from datahub.emitter.rest_emitter import DataHubRestEmitter
error AttributeError: module 'datahub' has no attribute 'emitter' ↓
cause Using the dummy package or an outdated version.
fix
pip install --upgrade acryl-datahub
Warnings
deprecated The 'datahub' package on PyPI (version 0.999.1) is a dummy placeholder. Install 'acryl-datahub' instead. ↓
fix Run: pip uninstall datahub && pip install acryl-datahub
breaking In older versions (pre-0.9.0), the import paths were different. The emitter was at 'datahub.emitter.rest_emitter' but now uses async/await patterns. ↓
fix Upgrade to latest acryl-datahub and use new import paths.
gotcha When using the emitters, ensure you call .emit() correctly. The synchronous emitter no longer exists; use async or use DataHubRestEmitter which is synchronous. ↓
fix Use DataHubRestEmitter for synchronous emits or use the async emitter with event loop.
Imports
- DataHubRestEmitter wrong
from datahub import DataHubRestEmittercorrectfrom datahub.emitter.rest_emitter import DataHubRestEmitter - DataHubGraph wrong
from datahub.graph import DataHubGraphcorrectfrom datahub.ingestion.graph.client import DataHubGraph
Quickstart
import os
from datahub.emitter.rest_emitter import DataHubRestEmitter
from datahub.metadata.schema_classes import DatasetPropertiesClass
# Create an emitter to DataHub
server = os.environ.get('DATAHUB_SERVER', 'http://localhost:8080')
token = os.environ.get('DATAHUB_TOKEN', '')
emitter = DataHubRestEmitter(gms_server=server, token=token)
# Emit a dataset metadata change
emitter.emit_mcp(
entity_urn='urn:li:dataset:(urn:li:dataPlatform:hive,test_dataset,PROD)',
aspect=DatasetPropertiesClass(description='Test dataset')
)
print('Metadata emitted successfully')