Airbyte Protocol Dataclass Models

raw JSON →
0.18.0 verified Sun Apr 12 auth: no python

This library declares the Airbyte Protocol using Python Dataclasses. It is designed for scenarios where speed and memory usage are critical, offering less performance overhead compared to Pydantic models. The library is actively maintained with frequent updates, typically released monthly, in alignment with the broader Airbyte platform. The current version is 0.18.0.

pip install airbyte-protocol-models-dataclasses
error ModuleNotFoundError: No module named 'airbyte_protocol_models_dataclasses'
cause The 'airbyte-protocol-models-dataclasses' package is not installed in the Python environment.
fix
Install the package using pip: 'pip install airbyte-protocol-models-dataclasses'.
error ImportError: cannot import name 'AirbyteMessage' from 'airbyte_protocol_models_dataclasses'
cause The 'AirbyteMessage' class does not exist in the 'airbyte_protocol_models_dataclasses' module.
fix
Verify the correct class name and import statement by referring to the official documentation.
error TypeError: __init__() missing 1 required positional argument: 'type'
cause An instance of a dataclass from 'airbyte_protocol_models_dataclasses' is being initialized without providing all required fields.
fix
Ensure all required fields are provided when initializing the dataclass instance.
error AttributeError: 'dict' object has no attribute 'to_dict'
cause Attempting to call 'to_dict()' on a dictionary object instead of a dataclass instance.
fix
Ensure that 'to_dict()' is called on an instance of a dataclass from 'airbyte_protocol_models_dataclasses', not a regular dictionary.
error ValueError: Invalid enum value 'UNKNOWN' for field 'status'
cause An invalid value is being assigned to an enum field in a dataclass from 'airbyte_protocol_models_dataclasses'.
fix
Assign a valid value to the enum field as defined in the dataclass's documentation.
breaking Protocol V1 models have been removed starting from version 0.17.0. Code relying on `v1` namespaces or structures will break.
fix Migrate all protocol model usage to the `v0` namespace, e.g., `from airbyte_protocol_models_dataclasses.v0.models import ...`.
breaking The field name 'schema' was renamed to 'json-schema' in version 0.16.0. This affects how catalog objects are structured and accessed.
fix Update any code that accesses or creates catalog configurations to use the `json-schema` field name instead of `schema`.
gotcha This library uses Python's native dataclasses for protocol models, explicitly to reduce performance overhead compared to Pydantic. Users accustomed to Pydantic's features (e.g., automatic type coercion, runtime validation, custom validators) will find these are not natively available.
fix Implement explicit type checking and validation logic if strict runtime validation is required, or consider using `airbyte-protocol-models-pdv2` if Pydantic's features are essential and the performance trade-off is acceptable.

This quickstart demonstrates how to construct basic Airbyte Protocol messages (Record and State) using the dataclass models. It shows how to create a `AirbyteRecordMessage` and `AirbyteStateMessage`, wrap them in a generic `AirbyteMessage`, and then serialize them to JSON format for inter-process communication, which is standard for the Airbyte Protocol. All fields are expected to be correctly typed according to the protocol's JSON schema.

import json
from dataclasses import asdict
from datetime import datetime
from airbyte_protocol_models_dataclasses.v0.models import AirbyteMessage, AirbyteRecordMessage, Type

# Create an AirbyteRecordMessage representing a data record
record_data = {
    "id": 1,
    "name": "Test User",
    "email": "test.user@example.com",
    "created_at": datetime.now().isoformat()
}
record_message = AirbyteRecordMessage(
    stream="users",
    data=record_data,
    emitted_at=int(datetime.now().timestamp() * 1000),
)

# Wrap the record in a top-level AirbyteMessage
airbyte_message = AirbyteMessage(
    type=Type.RECORD,
    record=record_message,
)

# Convert the dataclass instance to a dictionary and then to a JSON string
json_output = json.dumps(asdict(airbyte_message), indent=2)
print("\n--- Airbyte Record Message ---")
print(json_output)

# Example of creating an AirbyteStateMessage
state_data = {"users_sync_progress": {"last_id": 100, "updated_at": datetime.now().isoformat()}}
state_message = AirbyteMessage(
    type=Type.STATE,
    state=state_data,
)

# Serialize the state message
json_state_output = json.dumps(asdict(state_message), indent=2)
print("\n--- Airbyte State Message ---")
print(json_state_output)