Acryl DataHub Python Client

raw JSON →
0.999.1 verified Fri May 01 auth: no python deprecated

DataHub is an open-source metadata platform for the modern data stack. The Python client (acryl-datahub) provides utilities for ingesting metadata, interacting with the DataHub API, and managing data lineage. As of version 0.12.1, the package has transitioned from 'datahub' to 'acryl-datahub' on PyPI. The dummy package 'datahub' 0.999.1 is a placeholder and should be avoided in favor of 'acryl-datahub'.

pip install acryl-datahub
error ModuleNotFoundError: No module named 'datahub'
cause The 'datahub' dummy package is not installed or is the wrong package.
fix
pip install acryl-datahub
error ImportError: cannot import name 'DataHubRestEmitter' from 'datahub'
cause The import path changed or the wrong package is installed.
fix
Install acryl-datahub and use from datahub.emitter.rest_emitter import DataHubRestEmitter
error AttributeError: module 'datahub' has no attribute 'emitter'
cause Using the dummy package or an outdated version.
fix
pip install --upgrade acryl-datahub
deprecated The 'datahub' package on PyPI (version 0.999.1) is a dummy placeholder. Install 'acryl-datahub' instead.
fix Run: pip uninstall datahub && pip install acryl-datahub
breaking In older versions (pre-0.9.0), the import paths were different. The emitter was at 'datahub.emitter.rest_emitter' but now uses async/await patterns.
fix Upgrade to latest acryl-datahub and use new import paths.
gotcha When using the emitters, ensure you call .emit() correctly. The synchronous emitter no longer exists; use async or use DataHubRestEmitter which is synchronous.
fix Use DataHubRestEmitter for synchronous emits or use the async emitter with event loop.

Quickstart: Emit a dataset metadata change event.

import os
from datahub.emitter.rest_emitter import DataHubRestEmitter
from datahub.metadata.schema_classes import DatasetPropertiesClass

# Create an emitter to DataHub
server = os.environ.get('DATAHUB_SERVER', 'http://localhost:8080')
token = os.environ.get('DATAHUB_TOKEN', '')
emitter = DataHubRestEmitter(gms_server=server, token=token)

# Emit a dataset metadata change
emitter.emit_mcp(
    entity_urn='urn:li:dataset:(urn:li:dataPlatform:hive,test_dataset,PROD)',
    aspect=DatasetPropertiesClass(description='Test dataset')
)
print('Metadata emitted successfully')