{"id":1892,"library":"acryl-datahub","title":"DataHub Python CLI and SDK","description":"The `acryl-datahub` package provides a powerful Command Line Interface (CLI) and a Python SDK for interacting with DataHub, an open-source metadata platform. DataHub serves as a central nervous system for your data stack, enabling discovery, governance, and observability across various data assets. Currently at version 1.5.0.5, the library maintains an active release cadence with frequent updates and release candidates, ensuring ongoing feature development and stability.","status":"active","version":"1.5.0.5","language":"en","source_language":"en","source_url":"https://github.com/datahub-project/datahub","tags":["metadata","data-governance","cli","sdk","data-catalog","ai-ready"],"install":[{"cmd":"pip install acryl-datahub","lang":"bash","label":"Install core package"},{"cmd":"pip install 'acryl-datahub[datahub-rest]' # For programmatic interaction over REST","lang":"bash","label":"Install with REST emitter"}],"dependencies":[{"reason":"Required for the acryl-datahub CLI and SDK.","package":"Python","version":">=3.10"},{"reason":"Required by internal components; a breaking change in v1.4.0.2 moved to Pydantic v2.","package":"pydantic","version":">=2.0"}],"imports":[{"note":"Used for sending metadata changes to DataHub over REST.","symbol":"DatahubRestEmitter","correct":"from datahub.emitter.rest_emitter import DatahubRestEmitter"},{"note":"A wrapper for constructing metadata change proposals.","symbol":"MetadataChangeProposalWrapper","correct":"from datahub.emitter.mcp import MetadataChangeProposalWrapper"},{"note":"Example of a generated schema class for common metadata aspects.","symbol":"DatasetPropertiesClass","correct":"from datahub.metadata.schema_classes import DatasetPropertiesClass"},{"note":"Used for configuring and interacting with the DataHub GraphQL API programmatically.","symbol":"DatahubClientConfig","correct":"from datahub.ingestion.graph.client import DatahubClientConfig, DataHubGraph"}],"quickstart":{"code":"import os\nfrom datahub.emitter.rest_emitter import DatahubRestEmitter\nfrom datahub.emitter.mcp import MetadataChangeProposalWrapper\nfrom datahub.metadata.schema_classes import DatasetPropertiesClass\n\n# --- CLI Quickstart (run in your terminal) ---\n# 1. Install Docker and Docker Compose v2.\n# 2. Start a local DataHub instance:\n#    datahub docker quickstart\n#    (This command might take some time to download and start services)\n#\n# --- Python SDK Example (after DataHub is running) ---\n# For local quickstart, GMS server is typically http://localhost:8080\ngms_server = os.environ.get(\"DATAHUB_GMS_SERVER\", \"http://localhost:8080\")\ntoken = os.environ.get(\"DATAHUB_GMS_TOKEN\", \"\") # For cloud/secured instances, provide a token\n\n# Initialize the REST emitter\n# Note: The 'token' parameter is available for direct use, not just extra_headers.\nemitter = DatahubRestEmitter(gms_server=gms_server, token=token)\n\n# Define a sample dataset URN\ndataset_urn = \"urn:li:dataset:(urn:li:dataPlatform:hive,sample_dataset,PROD)\"\n\n# Create a DatasetProperties aspect\ndataset_properties = DatasetPropertiesClass(\n    description=\"This is a sample dataset emitted via the Python SDK quickstart.\",\n    customProperties={\n        \"owner_team\": \"data_platform\",\n        \"environment\": \"production_dev\"\n    }\n)\n\n# Create a MetadataChangeProposalWrapper\nmcp = MetadataChangeProposalWrapper(\n    entityUrn=dataset_urn,\n    aspect=dataset_properties,\n)\n\n# Emit the metadata change proposal\ntry:\n    emitter.emit(mcp)\n    print(f\"Successfully emitted properties for dataset: {dataset_urn}\")\nexcept Exception as e:\n    print(f\"Failed to emit metadata: {e}\")\n    print(\"Ensure your DataHub instance is running and accessible at\", gms_server)\n","lang":"python","description":"This quickstart first outlines how to set up a local DataHub instance using the CLI's `docker quickstart` command. Following this, it provides a Python snippet demonstrating how to programmatically connect to a DataHub server using the `DatahubRestEmitter` and publish basic dataset properties."},"warnings":[{"fix":"Upgrade your Python environment to version 3.10 or newer before upgrading `acryl-datahub`.","message":"Python 3.9 support has been officially dropped. All `acryl-datahub` packages now require Python 3.10 or later.","severity":"breaking","affected_versions":"v1.4.0 and later"},{"fix":"Set `THEME_V2_ENABLED=true` and `THEME_V2_DEFAULT=true` in your DataHub GMS configuration. The `THEME_V2_TOGGLEABLE` variable should also be set to `false`.","message":"The V1 UI theme is officially sunset as of v1.5.0. All development targets the V2 UI going forward. If you're self-hosting, ensure your GMS environment variables `THEME_V2_ENABLED` and `THEME_V2_DEFAULT` are set to `true`.","severity":"breaking","affected_versions":"v1.5.0 and later"},{"fix":"Ensure `pydantic>=2.0` is installed in your environment. If you have other packages requiring Pydantic v1, consider using separate virtual environments.","message":"The `acryl-datahub` package now requires Pydantic v2. Support for Pydantic v1 has been dropped.","severity":"breaking","affected_versions":"v1.4.0.2 and later"},{"fix":"Use stateful ingestion to clean up and re-ingest view lineage to generate new URNs based on the SHA-256 hash.","message":"SQL view query IDs now use SHA-256 hashes instead of URL-encoding the view URN. This means old query entities for view lineage tracking will become orphaned.","severity":"breaking","affected_versions":"v1.5.0 and later"},{"fix":"For production deployments, explicitly set `DATAHUB_TOKEN_SERVICE_SIGNING_KEY` and `DATAHUB_TOKEN_SERVICE_SALT` environment variables to your own secure values.","message":"For DataHub CLI version 1.5, the handling of the token signing key for Metadata Service Authentication has changed. If not explicitly set via environment variables, new random values are generated and stored locally (`~/.datahub/quickstart/.local-secrets.env`).","severity":"gotcha","affected_versions":"v1.5.0 and later"},{"fix":"Update any code that expects a `None` or `int` return type from `emit()` or `emit_mcp()`. The return value should now be checked for `TraceData` if trace information is needed.","message":"The `DatahubRestEmitter.emit()` method (and `emit_mcp()`) now returns `Optional[TraceData]` instead of `None` or an `int`. This change exposes trace IDs for SYNC_PRIMARY and ASYNC modes.","severity":"gotcha","affected_versions":"v1.5.0 and later"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}