Acryl DataHub Dagster Plugin

raw JSON →
1.5.0.17 verified Fri May 01 auth: no python

A Dagster plugin that captures pipeline execution metadata and sends it to DataHub for data lineage and observability. Current version: 1.5.0.17. Requires Python >=3.10. Released as part of the DataHub project.

pip install acryl-datahub-dagster-plugin
error ModuleNotFoundError: No module named 'acryl_datahub_dagster_plugin'
cause Incorrect import path; the correct module name is 'datahub_dagster_plugin' (underscore).
fix
Change imports to use 'datahub_dagster_plugin' instead of 'acryl_datahub_dagster_plugin'.
error ModuleNotFoundError: No module named 'datahub_dagster_plugin'
cause The plugin is not installed or the deprecated package is installed. Ensure you have installed 'acryl-datahub-dagster-plugin'.
fix
Run 'pip install acryl-datahub-dagster-plugin' and verify installation.
error ImportError: cannot import name 'DatahubDagsterResource' from 'datahub_dagster_plugin'
cause The class might be in a submodule. In recent versions, it is in 'datahub_dagster_plugin.resources'.
fix
Use: 'from datahub_dagster_plugin.resources import DatahubDagsterResource'.
gotcha The plugin uses underscore in the import path ('datahub_dagster_plugin') despite the PyPI name having hyphens ('acryl-datahub-dagster-plugin'). Many users mistakenly import from 'acryl_datahub_dagster_plugin'.
fix Use 'from datahub_dagster_plugin.hooks import DatahubDagsterHook' (or .resources).
breaking In version 1.0.0, the plugin was rewritten to use the new DataHub Python SDK (acryl-datahub). The old 'datahub-dagster-plugin' is deprecated and removed. Users must migrate to 'acryl-datahub-dagster-plugin' and update imports.
fix Uninstall the old 'datahub-dagster-plugin' and install 'acryl-datahub-dagster-plugin'. Update imports from 'datahub_dagster_plugin' to 'acryl_datahub_dagster_plugin' (but note the actual module path is 'datahub_dagster_plugin' - check documentation).
deprecated The 'datahub-dagster-plugin' (without 'acryl-') is deprecated and no longer maintained. Users should switch to 'acryl-datahub-dagster-plugin'.
fix Use 'pip install acryl-datahub-dagster-plugin' and update imports accordingly.

Define a simple Dagster job with a DatahubDagsterResource to emit metadata. Uses environment variable for GMS host.

import os
from dagster import job, op, OpExecutionContext
from datahub_dagster_plugin.resources import DatahubDagsterResource
from datahub.emitter.rest_emitter import DatahubRestEmitter

@op(required_resource_keys={'datahub'})
def my_op(context: OpExecutionContext):
    context.log.info("Running op")
    return 1

@job(resource_defs={
    'datahub': DatahubDagsterResource(
        emitter=DatahubRestEmitter(gms_server=os.environ.get('DATAHUB_GMS_HOST', 'http://localhost:8080'))
    )
})
def my_job():
    my_op()

if __name__ == '__main__':
    result = my_job.execute_in_process()