Apache Avro (Python 3)
Apache Avro (`avro-python3`) is the official Python 3 implementation of the Avro remote procedure call and data serialization framework. It enables defining language-agnostic data schemas and serializing data into a compact binary format, facilitating cross-language data exchange. The current stable version available on PyPI is 1.10.2, with releases happening periodically as part of the broader Apache Avro project.
Warnings
- gotcha Ensure you install `avro-python3` for Python 3 projects. The older `avro` package (without the `-python3` suffix) is largely unmaintained for Python 3 and may cause compatibility issues or unexpected behavior. This distinction is crucial for modern Python development.
- gotcha Misunderstanding Avro's schema evolution rules (e.g., adding/removing fields, changing types, default values) can lead to data deserialization errors, especially when different versions of producers and consumers interact.
- gotcha Avro data files (`.avro` extension) are binary and not human-readable. Attempting to inspect them directly with a text editor will yield unintelligible, garbled output, which can be confusing for new users.
Install
-
pip install avro-python3
Imports
- avro.schema
import avro.schema
- avro.io
import avro.io
- avro.datafile
import avro.datafile
Quickstart
import avro.schema
import avro.io
import avro.datafile
import io
# 1. Define the Avro schema
schema_str = """
{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
"""
schema = avro.schema.parse(schema_str)
# 2. Write data to a BytesIO object (simulating a file)
writer = avro.io.DatumWriter(schema)
bytes_writer = io.BytesIO()
data_file_writer = avro.datafile.DataFileWriter(bytes_writer, writer, schema)
data_file_writer.append({"name": "Alyssa", "favorite_number": 256, "favorite_color": None})
data_file_writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
data_file_writer.close()
# Get the serialized data
avro_data = bytes_writer.getvalue()
# 3. Read data from the BytesIO object
bytes_reader = io.BytesIO(avro_data)
reader = avro.io.DatumReader(schema)
data_file_reader = avro.datafile.DataFileReader(bytes_reader, reader)
read_records = [record for record in data_file_reader]
data_file_reader.close()
# Print the read records
print(read_records)
# Expected output:
# [{'name': 'Alyssa', 'favorite_number': 256, 'favorite_color': None}, {'name': 'Ben', 'favorite_number': 7, 'favorite_color': 'red'}]