Apache Avro Python

1.12.1 · active · verified Sun Apr 05

Avro is a data serialization and RPC framework for various languages, including Python. It uses JSON for defining data types and protocols and serializes data in a compact binary format. The Python library provides tools for schema parsing, binary encoding/decoding, and working with Avro Data Files. The current version is 1.12.1, with releases typically occurring a few times a year for minor or patch updates.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define an Avro schema, serialize Python dictionaries (records) into an Avro data file (here, in-memory using `io.BytesIO`), and then deserialize them back into Python dictionaries. It uses `avro.schema.parse` to load the schema, `DataFileWriter` and `DatumWriter` to write, and `DataFileReader` and `DatumReader` to read.

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
import io

# Define schema
schema_str = '''
{
    "type": "record",
    "name": "User",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "favorite_number", "type": ["int", "null"]},
        {"name": "favorite_color", "type": ["string", "null"]}
    ]
}
'''
schema = avro.schema.parse(schema_str)

# Prepare data
users = [
    {"name": "Alyssa", "favorite_number": 256, "favorite_color": "red"},
    {"name": "Ben", "favorite_number": 7, "favorite_color": "blue"},
    {"name": "Charlie", "favorite_number": null, "favorite_color": "green"},
    {"name": "David", "favorite_number": 42, "favorite_color": null}
]

# Write data to an in-memory Avro file
# Using io.BytesIO for an in-memory file-like object
output_stream = io.BytesIO()
writer = DataFileWriter(output_stream, DatumWriter(), schema)
for user in users:
    writer.append(user)
writer.close()

# Reset stream position to read from the beginning
output_stream.seek(0)

# Read data from the in-memory Avro file
reader = DataFileReader(output_stream, DatumReader())
print("Reading Avro data:")
for user in reader:
    print(user)
reader.close()

output_stream.close()

view raw JSON →