{"library":"openlineage-python","title":"OpenLineage Python Client","description":"OpenLineage Python Client is the official Python library for interacting with the OpenLineage standard. It allows users to emit lineage metadata events from Python code to an OpenLineage backend (like Marquez) for data governance and observability. It is actively maintained with frequent releases, currently at version 1.45.0, and forms the basis for various integrations like Airflow and dbt.","status":"active","version":"1.45.0","language":"en","source_language":"en","source_url":"https://github.com/OpenLineage/OpenLineage","tags":["data lineage","metadata","etl","data governance","client library"],"install":[{"cmd":"pip install openlineage-python","lang":"bash","label":"Core client"},{"cmd":"pip install openlineage-python[fsspec]","lang":"bash","label":"Remote Filesystem (S3, GCS, Azure)"},{"cmd":"pip install openlineage-python[kafka]","lang":"bash","label":"Kafka transport"},{"cmd":"pip install openlineage-python[msk-iam]","lang":"bash","label":"AWS MSK IAM authentication"},{"cmd":"pip install openlineage-python[datazone]","lang":"bash","label":"AWS DataZone integration"}],"dependencies":[{"reason":"For remote filesystem support (e.g., S3, GCS, Azure) via the `fsspec` extra.","package":"fsspec","optional":true},{"reason":"Required for Kafka transport via the `kafka` extra.","package":"confluent-kafka","optional":true}],"imports":[{"symbol":"OpenLineageClient","correct":"from openlineage.client.client import OpenLineageClient"},{"note":"Use 'event_v2' for the latest spec version; 'event' might refer to older or deprecated structures.","wrong":"from openlineage.client.event import RunEvent","symbol":"RunEvent","correct":"from openlineage.client.event_v2 import RunEvent"},{"symbol":"RunState","correct":"from openlineage.client.event_v2 import RunState"},{"symbol":"Job","correct":"from openlineage.client.event_v2 import Job"},{"symbol":"InputDataset","correct":"from openlineage.client.event_v2 import InputDataset"},{"symbol":"OutputDataset","correct":"from openlineage.client.event_v2 import OutputDataset"}],"quickstart":{"code":"import os\nfrom datetime import datetime\nimport uuid\n\nfrom openlineage.client.client import OpenLineageClient\nfrom openlineage.client.event_v2 import RunEvent, RunState, Job, InputDataset, OutputDataset, Run\n\n# Configure OpenLineage to send events to the console for demonstration\nos.environ['OPENLINEAGE_URL'] = os.environ.get('OPENLINEAGE_URL', 'console') # Use 'console' for local output\nos.environ['OPENLINEAGE_NAMESPACE'] = os.environ.get('OPENLINEAGE_NAMESPACE', 'my_app_namespace')\n\n# Initialize the OpenLineage client\nclient = OpenLineageClient()\n\ndef my_data_processing_job():\n    job_name = \"my_simple_job\"\n    run_id = str(uuid.uuid4())\n    namespace = os.environ['OPENLINEAGE_NAMESPACE']\n    \n    input_dataset_name = \"input_data\"\n    output_dataset_name = \"processed_data\"\n\n    # 1. Emit START event\n    start_event = RunEvent(\n        eventType=RunState.START,\n        eventTime=datetime.now().isoformat(),\n        run=Run(runId=run_id, facets={}),\n        job=Job(namespace=namespace, name=job_name, facets={}),\n        inputs=[InputDataset(namespace=namespace, name=input_dataset_name)],\n        outputs=[OutputDataset(namespace=namespace, name=output_dataset_name)],\n        producer=client.producer,\n        schemaURL=client.schema_url_v2\n    )\n    client.emit(start_event)\n    print(f\"Emitted START event for job '{job_name}' with run ID '{run_id}'\")\n\n    try:\n        # Simulate data processing\n        print(f\"Processing data for job '{job_name}'...\")\n        # Add actual processing logic here\n\n        # 2. Emit COMPLETE event on success\n        complete_event = RunEvent(\n            eventType=RunState.COMPLETE,\n            eventTime=datetime.now().isoformat(),\n            run=Run(runId=run_id, facets={}),\n            job=Job(namespace=namespace, name=job_name, facets={}),\n            inputs=[InputDataset(namespace=namespace, name=input_dataset_name)],\n            outputs=[OutputDataset(namespace=namespace, name=output_dataset_name)],\n            producer=client.producer,\n            schemaURL=client.schema_url_v2\n        )\n        client.emit(complete_event)\n        print(f\"Emitted COMPLETE event for job '{job_name}'\")\n\n    except Exception as e:\n        print(f\"Job '{job_name}' failed: {e}\")\n        # 3. Emit FAIL event on failure\n        fail_event = RunEvent(\n            eventType=RunState.FAIL,\n            eventTime=datetime.now().isoformat(),\n            run=Run(runId=run_id, facets={}),\n            job=Job(namespace=namespace, name=job_name, facets={}),\n            inputs=[InputDataset(namespace=namespace, name=input_dataset_name)],\n            outputs=[OutputDataset(namespace=namespace, name=output_dataset_name)],\n            producer=client.producer,\n            schemaURL=client.schema_url_v2\n        )\n        client.emit(fail_event)\n        print(f\"Emitted FAIL event for job '{job_name}'\")\n\nif __name__ == \"__main__\":\n    my_data_processing_job()","lang":"python","description":"This quickstart demonstrates how to initialize the `OpenLineageClient` and manually emit `START` and `COMPLETE` (or `FAIL`) events for a data processing job. It sets the `OPENLINEAGE_URL` to 'console' to print events directly to standard output, making it easy to see the generated lineage without a full OpenLineage backend."},"warnings":[{"fix":"Ensure your configuration source (file or environment variables) is correctly prioritized and accessible by the client.","message":"The OpenLineage client can be configured via `openlineage.yml` file (searched in `OPENLINEAGE_CONFIG` env var, CWD, or `$HOME/.openlineage`) or directly via environment variables like `OPENLINEAGE_URL` and `OPENLINEAGE_API_KEY`. Environment variables typically override config file settings for HTTP transport.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Regularly upgrade both the `openlineage-python` client and the `apache-airflow-providers-openlineage` to their latest compatible versions.","message":"When using `openlineage-python` with the `apache-airflow-providers-openlineage`, it's crucial to understand their roles. The Python client (`openlineage-python`) handles event transmission, while the Airflow provider extracts Airflow-specific metadata. Both should be kept updated independently, as the client has no Airflow version dependencies.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Consider using manual annotation (e.g., custom facets) or developing custom extractors to provide more detailed lineage for these operators.","message":"Lineage extraction for generic operators like `PythonOperator` or `KubernetesPodOperator` in Airflow might be limited due to their 'black box' nature. Full input/output dataset metadata may not be automatically captured.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade your Spark environment to version 3.x or later if you are using `openlineage-python` 1.38.0 or newer for Spark integrations.","message":"Support for Spark 2.x versions was dropped in `openlineage-python` version 1.38.0. The minimum supported Spark version is now 3.x.","severity":"breaking","affected_versions":">=1.38.0"},{"fix":"Install the client with the `kafka` extra: `pip install openlineage-python[kafka]`.","message":"The `KafkaTransport` will fail to initialize if the `confluent-kafka` package is not installed. This dependency is part of the `openlineage-python[kafka]` extra.","severity":"gotcha","affected_versions":"All versions using Kafka transport"}],"env_vars":null,"last_verified":"2026-04-05T00:00:00.000Z","next_check":"2026-07-04T00:00:00.000Z"}