Apache Avro (Python 3)
Apache Avro (`avro-python3`) is the official Python 3 implementation of the Avro remote procedure call and data serialization framework. It enables defining language-agnostic data schemas and serializing data into a compact binary format, facilitating cross-language data exchange. The current stable version available on PyPI is 1.10.2, with releases happening periodically as part of the broader Apache Avro project.
Common errors
-
ModuleNotFoundError: No module named 'avro.schema'
cause This error occurs when the 'avro' package is installed instead of 'avro-python3' in a Python 3 environment.fixInstall the correct package using 'pip install avro-python3'. -
AttributeError: module 'avro' has no attribute 'schema'
cause This error arises when the 'avro' package, intended for Python 2, is used in a Python 3 environment.fixUninstall 'avro' and install 'avro-python3' using 'pip uninstall avro' followed by 'pip install avro-python3'. -
ModuleNotFoundError: No module named 'pycodestyle'
cause This error occurs due to a missing 'pycodestyle' dependency when installing 'avro-python3' version 1.9.2.fixInstall 'pycodestyle' manually using 'pip install pycodestyle' before installing 'avro-python3'. -
NameError: name 'file' is not defined
cause This error occurs when using the 'avro.tool' module's 'dump' command in Python 3, due to the use of 'file' instead of 'open'.fixUpdate to 'avro-python3' version 1.10.0 or later, where this issue is resolved. -
AvroTypeException: The datum is not an example of the schema
cause This error occurs when attempting to serialize data containing unsupported types, such as 'date' objects, using 'avro-python3'.fixConvert 'date' objects to strings or integers before serialization, or switch to the 'avro' package which supports logical types.
Warnings
- gotcha Ensure you install `avro-python3` for Python 3 projects. The older `avro` package (without the `-python3` suffix) is largely unmaintained for Python 3 and may cause compatibility issues or unexpected behavior. This distinction is crucial for modern Python development.
- gotcha Misunderstanding Avro's schema evolution rules (e.g., adding/removing fields, changing types, default values) can lead to data deserialization errors, especially when different versions of producers and consumers interact.
- gotcha Avro data files (`.avro` extension) are binary and not human-readable. Attempting to inspect them directly with a text editor will yield unintelligible, garbled output, which can be confusing for new users.
- gotcha Encountering a `ValueError: I/O operation on closed file.` indicates that an attempt was made to perform an I/O operation (such as `getvalue()`, `read()`, or `write()`) on a file-like object after it has already been closed. This is a common programming error related to resource management.
- gotcha Attempting to perform I/O operations (like `getvalue()` or `read()`) on a file-like object after its underlying stream has been closed will result in a `ValueError: I/O operation on closed file`.
Install
-
pip install avro-python3
Imports
- avro.schema
import avro.schema
- avro.io
import avro.io
- avro.datafile
import avro.datafile
Quickstart
import avro.schema
import avro.io
import avro.datafile
import io
# 1. Define the Avro schema
schema_str = """
{
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
"""
schema = avro.schema.parse(schema_str)
# 2. Write data to a BytesIO object (simulating a file)
writer = avro.io.DatumWriter(schema)
bytes_writer = io.BytesIO()
data_file_writer = avro.datafile.DataFileWriter(bytes_writer, writer, schema)
data_file_writer.append({"name": "Alyssa", "favorite_number": 256, "favorite_color": None})
data_file_writer.append({"name": "Ben", "favorite_number": 7, "favorite_color": "red"})
data_file_writer.close()
# Get the serialized data
avro_data = bytes_writer.getvalue()
# 3. Read data from the BytesIO object
bytes_reader = io.BytesIO(avro_data)
reader = avro.io.DatumReader(schema)
data_file_reader = avro.datafile.DataFileReader(bytes_reader, reader)
read_records = [record for record in data_file_reader]
data_file_reader.close()
# Print the read records
print(read_records)
# Expected output:
# [{'name': 'Alyssa', 'favorite_number': 256, 'favorite_color': None}, {'name': 'Ben', 'favorite_number': 7, 'favorite_color': 'red'}]