Dataclasses Avro Schema
Dataclasses Avro Schema is a Python library that enables the generation of Avro schemas from Python dataclasses, Pydantic models, and Faust Records. It also provides functionalities for serializing and deserializing Python instances with these Avro schemas. The library is actively maintained, with frequent releases, and is currently at version 0.66.3.
Warnings
- breaking Python 3.9 support was dropped in version 0.66.0. Users on Python 3.9 or older must upgrade their Python environment to 3.10+.
- breaking The primary API for schema generation changed from using `SchemaGenerator` to inheriting directly from `AvroModel` as of version 0.14.0. Old code using `SchemaGenerator` will no longer work as expected.
- breaking The `types.Enum` class was replaced with the expectation of using standard Python `enum.Enum` (potentially mixed with `str`) as of version 0.23.0. This requires creating custom enum classes instead of passing a list of symbols to `types.Enum`.
- gotcha When defining optional fields (e.g., `typing.Optional[str] = None`), Avro unions require the default value's type to be the first in the union array. If `None` is the default, the schema will be `["null", "string"]`. Ensure explicit `None` defaults for optional fields if you want `null` to be the first type in the union to avoid schema resolution issues.
Install
-
pip install dataclasses-avroschema -
pip install 'dataclasses-avroschema[pydantic]' # For Pydantic integration -
pip install 'dataclasses-avroschema[faust]' # For Faust integration -
pip install 'dataclasses-avroschema[faker]' # For generating fake data -
pip install 'dataclasses-avroschema[cli]' # For CLI tools (dc-avro)
Imports
- AvroModel
from dataclasses_avroschema import AvroModel
- AvroModel
from dataclasses_avroschema import AvroModel
- types.Enum
import enum; class MyEnum(enum.Enum): ...
Quickstart
import dataclasses
import enum
import typing
from dataclasses_avroschema import AvroModel
class FavoriteColor(enum.Enum):
BLUE = "Blue"
YELLOW = "Yellow"
GREEN = "Green"
@dataclasses.dataclass
class User(AvroModel):
"An User"
name: str
age: int
pets: typing.List[str]
accounts: typing.Dict[str, int]
favorite_color: FavoriteColor
country: str = "Argentina"
address: typing.Optional[str] = None
class Meta:
namespace = "User.v1"
aliases = ["user-v1", "super user"]
# Generate Avro schema
avro_schema = User.avro_schema()
print("Avro Schema:")
print(avro_schema)
# Create an instance
user_instance = User(
name="John Doe",
age=30,
pets=["dog", "cat"],
accounts={"bank": 1000, "crypto": 500},
favorite_color=FavoriteColor.BLUE,
country="USA",
address="123 Main St"
)
# Serialize to Avro binary
serialized_data = user_instance.serialize()
print("\nSerialized data (bytes):", serialized_data)
# Deserialize from Avro binary
deserialized_user = User.deserialize(serialized_data)
print("\nDeserialized user:", deserialized_user)