Avro Record Class and Specific Record Reader Generator

0.7.16 · active · verified Fri Apr 10

avro-gen3 is a Python library that generates concrete Avro record classes with type hints and a specific record reader. It addresses the typeless nature of default Avro Python implementations by wrapping the standard Avro DatumReader to return these type-hinted classes instead of generic dictionaries. This project is a fork of `avro_gen`, enhanced with improved Python 3 support, better namespace handling, documentation generation, and JSON (de-)serialization capabilities. The current version is 0.7.16, released on September 5, 2024, indicating an active but irregular release cycle.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define an Avro schema, use `avro-gen3` to generate Python classes for it, and then serialize/deserialize data using these generated classes. It highlights the dynamic import of generated classes based on the schema's namespace and the use of the generated SCHEMA object with standard Avro I/O tools.

import os
import sys
import tempfile
from pathlib import Path
from avrogen import write_schema_files

# 1. Define a simple Avro schema
avro_schema_json = '''
{
  "type": "record",
  "name": "User",
  "namespace": "com.example.app",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "favorite_number", "type": ["int", "null"], "default": null}
  ]
}
'''

# 2. Define an output directory for generated classes
with tempfile.TemporaryDirectory() as tmpdir_name:
    output_dir = Path(tmpdir_name)
    print(f"Generated Avro classes will be written to: {output_dir}")

    # 3. Generate Python classes from the Avro schema
    write_schema_files(avro_schema_json, output_dir)

    # Add the output directory to sys.path to enable import
    sys.path.insert(0, str(output_dir))

    try:
        # 4. Import the generated classes and reader
        # The Avro namespace 'com.example.app' translates to a path within the output_dir
        from com.example.app import User  # Access the generated User class
        from avro.io import DatumWriter, DatumReader
        from avro.datafile import DataFileWriter, DataFileReader

        # 5. Create an instance of the generated class
        user_record = User(name="Alice", favorite_number=123)
        print(f"Created user record: {user_record}")
        print(f"User name: {user_record.name}, Favorite number: {user_record.favorite_number}")

        # 6. Serialize and deserialize using standard Avro tools with the generated schema/classes
        # Note: avro-gen3 wraps DatumReader but for DataFileWriter/Reader, you still use avro's types
        # For simpler examples, we might use the original avro library's DatumWriter/Reader directly
        # The main benefit of avro-gen3 is the type-hinted classes.

        # The generated classes are DictWrapper instances, compatible with standard Avro I/O
        output_file = output_dir / "users.avro"
        writer = DataFileWriter(open(output_file, "wb"), DatumWriter(), user_record.SCHEMA)
        writer.append(user_record._inner_dict) # avro-gen3 records are dict wrappers
        writer.close()

        reader = DataFileReader(open(output_file, "rb"), DatumReader())
        for read_user_dict in reader:
            # When reading back, DatumReader returns dicts. You'd re-wrap if desired.
            read_user = User(**read_user_dict)
            print(f"Deserialized user: {read_user.name}, {read_user.favorite_number}")
        reader.close()

    finally:
        # Clean up sys.path
        sys.path.remove(str(output_dir))

view raw JSON →