OpenLineage Integration Common Library
The `openlineage-integration-common` library provides shared data models, utilities, and provider interfaces for building OpenLineage integrations in Python. It is a foundational component often used by other OpenLineage libraries (like `openlineage-python`) and custom integrations. The current version is 1.46.0, and it follows a frequent release cadence, often updated alongside the main OpenLineage project.
Common errors
-
ModuleNotFoundError: No module named 'openlineage.common'
cause The `openlineage-integration-common` package is not installed in the Python environment, or a module from it is being imported incorrectly or from a location not on the Python path.fixEnsure the package is installed: `pip install openlineage-integration-common` -
AttributeError: 'NoneType' object has no attribute 'host'
cause This error typically occurs within an OpenLineage Airflow extractor when it attempts to access connection details (like host or port) from an Airflow connection object that is `None` or not properly resolved, often because connection information is stored in a secrets backend that the extractor cannot access.fixVerify that Airflow connections are correctly configured and accessible to the OpenLineage extractor. This might involve configuring Airflow's secrets backend or ensuring connection details are explicitly provided if the extractor cannot resolve them automatically. -
ValueError: OpenLineage is missing configuration, please refer to the OL setup docs.
cause The OpenLineage client or an integration (like the Airflow provider) cannot find essential configuration parameters, such as the OpenLineage backend URL (`OPENLINEAGE_URL`) or namespace (`OPENLINEAGE_NAMESPACE`), preventing it from emitting events.fixSet the required environment variables (e.g., `OPENLINEAGE_URL=http://localhost:5000 OPENLINEAGE_NAMESPACE=default`) or provide a valid `openlineage.yml` configuration file in a discoverable location. -
The Airflow Scheduler and Airflow Triggerer are failing to load the openlineage plugin with Custom extractors
cause Airflow's scheduler or triggerer processes are unable to correctly import or load custom OpenLineage extractors. This is often due to an incorrect path specified in the `OPENLINEAGE_EXTRACTORS` environment variable, or issues like circular imports within the custom extractor code that prevent successful loading.fixVerify that the `OPENLINEAGE_EXTRACTORS` environment variable points to a correct and importable path from the Airflow worker's Python environment. Additionally, ensure custom extractor code avoids top-level Airflow imports by placing them within methods or guarding them with `typing.TYPE_CHECKING` to prevent circular dependencies.
Warnings
- breaking Version 1.40.0 experienced a breaking change where `__version__` variables were missing in top-level modules, which could affect tools relying on programmatic version checks.
- gotcha This library (`openlineage-integration-common`) provides common models and utilities primarily for *building* OpenLineage integrations, or as an internal dependency. It is NOT the primary client library for sending OpenLineage events.
- gotcha Many core OpenLineage 'Facet' definitions (e.g., `SchemaDatasetFacet`, `RunFacet`) are located in the `openlineage.client.facet` module, which is part of the `openlineage-python` package, not directly in `openlineage.common`.
Install
-
pip install openlineage-integration-common
Imports
- DbTableSchema
from openlineage.common.models import DbTableSchema
- SQLStatement
from openlineage.common.provider import SQLStatement
- get_common_config
from openlineage.common.config import get_common_config
- Source
from openlineage.common.models import Source
Quickstart
from openlineage.common.models import DbTableSchema, Source
from openlineage.common.provider import SQLStatement
# Example: Defining a database table schema
db_table = DbTableSchema(
schema='public',
table='my_table',
fields=[
{'name': 'id', 'type': 'int'},
{'name': 'name', 'type': 'string'}
]
)
print(f"Defined DB Table: {db_table.json(indent=2)}")
# Example: Defining a data source
my_source = Source(scheme='postgresql', authority='localhost:5432', connection_url='jdbc:postgresql://localhost:5432/mydb')
print(f"Defined Source: {my_source.json(indent=2)}")
# Example: Representing a SQL statement (without actual parsing logic)
sql_statement = SQLStatement(query='SELECT * FROM public.my_table')
print(f"SQL Statement: {sql_statement.json(indent=2)}")