Fast Avro for Python

raw JSON →
1.12.1 verified Tue May 12 auth: no python install: verified quickstart: verified

Fastavro is a high-performance Python library for reading and writing Avro files. It provides a significantly faster alternative to the official Apache Avro Python library, leveraging C extensions (Cython) for optimal speed. The library supports various compression codecs and is actively maintained, making it a popular choice for high-throughput Avro serialization and deserialization in Python applications.

pip install fastavro
error ModuleNotFoundError: No module named 'fastavro'
cause The `fastavro` library has not been installed in your current Python environment.
fix
Run pip install fastavro in your terminal to install the library.
error fastavro.exceptions.SchemaResolutionError
cause This error occurs when the data being written or read does not conform to the Avro schema, or when there's an incompatibility between the writer's schema and the reader's schema, often related to missing required fields, mismatched types, or issues with aliases/named schemas.
fix
Ensure that your data strictly adheres to the provided Avro schema. When reading, verify that the reader's schema is compatible with the writer's schema, paying close attention to field names, types, and the use of aliases or union types. Validate your schema structure, especially for enums or complex types.
error TypeError: Expected dict, got str
cause This typically happens when attempting to write data that is not in the expected dictionary format, or when a field within a record has an incorrect data type (e.g., passing a string where an integer is expected) according to the Avro schema.
fix
Ensure that the records you are passing to fastavro.writer are lists of dictionaries, and that each dictionary's values precisely match the types defined in your Avro schema for the corresponding fields. For example, if your schema defines a record, you must pass a Python dictionary for that record, not a string or other type.
error ValueError: snappy codec is supported but you need to install python-snappy
cause You are attempting to read or write an Avro file compressed with the 'snappy' codec, but the necessary `python-snappy` library (or `cramjam` for newer `fastavro` versions) is not installed in your environment.
fix
Install the required compression library. For Snappy, run pip install python-snappy (or pip install cramjam for fastavro versions that use it for Snappy). Similarly, for Zstandard, install pip install python-zstandard (or pip install backports.zstd for older Python versions).
breaking The global cache of parsed schemas was removed in version 0.24.0. Code that relied on manipulating or accessing this global cache via `parse_schema` would have broken.
fix Do not rely on a global cache for schemas. Pass parsed schemas explicitly or re-parse them as needed. `parse_schema` returns a parsed schema object that should be passed to `writer` or `reader` functions.
deprecated Using `python-snappy` for Snappy compression is deprecated. `fastavro` recommends `cramjam` for Snappy, Zstandard, and LZ4 compression due to better compatibility and features.
fix Install `cramjam` (`pip install cramjam`) and ensure it's available in your environment for Snappy compression.
gotcha Reading Avro data with an incompatible reader schema (e.g., missing fields, mismatched types) can lead to `SchemaResolutionError` or incorrect data during deserialization.
fix Always ensure your reader schema is compatible with the writer schema, especially when dealing with schema evolution. Use `parse_schema` and provide both `writer_schema` (from the file) and `reader_schema` (your application's expected schema) to `fastavro.reader` for schema resolution.
gotcha When serializing data, if a field is marked as 'required' in the Avro schema but is missing from the Python dictionary record, `fastavro` will raise an error.
fix Ensure all required fields are present in your Python dictionary records before passing them to `fastavro.writer`. For optional fields, explicitly use `null` if the field is omitted, or define a default value in the schema.
gotcha When appending records to an existing Avro file, the file must be opened in `a+b` mode (read and append binary). Passing `None` as the schema to the `writer` function is recommended, as the existing file's schema will be reused. Using `ab` mode or providing a schema will likely lead to errors.
fix Open the file with `mode='a+b'` and call `writer(file_object, None, more_records)`.
gotcha Using `parse_schema(..., expand=True)` generates a schema that may not fully conform to the Avro specification for all scenarios, especially when dealing with referenced schemas. The output of this function with `expand=True` should generally be considered for output/inspection only and not passed directly to `reader` or `writer` functions, as it might cause exceptions.
fix Avoid using `expand=True` if you intend to reuse the parsed schema for reading or writing. Instead, manage referenced schemas via the `named_schemas` argument in `parse_schema` if you have complex, inter-dependent schemas.
gotcha When reading a union of records, if `return_record_name=True` is specified in `reader()`, the result for a union type will be a tuple `(record_name, record_value)`. If a union contains only one record type, `return_record_name_override=True` can modify this behavior to return just the record value, without the name tuple.
fix Be aware of the return type when `return_record_name` is set. Adjust your code to unpack the `(name, value)` tuple or use `return_record_name_override=True` if you prefer a simpler return for single-type unions.
python os / libc status wheel install import disk
3.10 alpine (musl) wheel - 0.07s 25.9M
3.10 alpine (musl) - - 0.07s 26.1M
3.10 slim (glibc) wheel 1.8s 0.05s 30M
3.10 slim (glibc) - - 0.05s 30M
3.11 alpine (musl) wheel - 0.08s 28.3M
3.11 alpine (musl) - - 0.09s 28.5M
3.11 slim (glibc) wheel 1.9s 0.08s 32M
3.11 slim (glibc) - - 0.07s 32M
3.12 alpine (musl) wheel - 0.06s 20.3M
3.12 alpine (musl) - - 0.11s 20.5M
3.12 slim (glibc) wheel 1.7s 0.06s 24M
3.12 slim (glibc) - - 0.06s 24M
3.13 alpine (musl) wheel - 0.05s 19.9M
3.13 alpine (musl) - - 0.05s 20.0M
3.13 slim (glibc) wheel 1.6s 0.05s 24M
3.13 slim (glibc) - - 0.05s 24M
3.9 alpine (musl) wheel - 0.07s 25.3M
3.9 alpine (musl) - - 0.06s 25.4M
3.9 slim (glibc) wheel 2.2s 0.06s 29M
3.9 slim (glibc) - - 0.06s 29M

This quickstart demonstrates how to define an Avro schema, write a list of Python dictionaries (records) into an in-memory Avro binary format using `fastavro.writer`, and then read those records back using `fastavro.reader`. It also highlights the use of `parse_schema` for efficiency and `codec` for compression.

import io
from fastavro import writer, reader, parse_schema

# 1. Define an Avro schema
schema = {
    'doc': 'A simple user record.',
    'name': 'User',
    'namespace': 'example.avro',
    'type': 'record',
    'fields': [
        {'name': 'name', 'type': 'string'},
        {'name': 'favorite_number', 'type': ['int', 'null'], 'default': None},
        {'name': 'favorite_color', 'type': ['string', 'null'], 'default': 'green'}
    ]
}

# It's optional but recommended to parse the schema once for performance
parsed_schema = parse_schema(schema)

# 2. Prepare some records
records = [
    {'name': 'Alice', 'favorite_number': 256, 'favorite_color': 'blue'},
    {'name': 'Bob', 'favorite_number': 7, 'favorite_color': None},
    {'name': 'Charlie', 'favorite_number': None, 'favorite_color': 'red'}
]

# 3. Write records to an in-memory Avro file (BytesIO)
bytes_writer = io.BytesIO()
writer(bytes_writer, parsed_schema, records, codec='deflate')

# 4. Read records back from the in-memory Avro file
bytes_writer.seek(0) # Rewind the buffer to the beginning
avro_reader = reader(bytes_writer)

read_records = []
for record in avro_reader:
    read_records.append(record)

print("Original Records:", records)
print("Read Records:", read_records)

# Verify that read records match original records
assert records == read_records
print("Successfully wrote and read Avro records!")