Fast Avro for Python

1.12.1 · active · verified Sun Mar 29

Fastavro is a high-performance Python library for reading and writing Avro files. It provides a significantly faster alternative to the official Apache Avro Python library, leveraging C extensions (Cython) for optimal speed. The library supports various compression codecs and is actively maintained, making it a popular choice for high-throughput Avro serialization and deserialization in Python applications.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to define an Avro schema, write a list of Python dictionaries (records) into an in-memory Avro binary format using `fastavro.writer`, and then read those records back using `fastavro.reader`. It also highlights the use of `parse_schema` for efficiency and `codec` for compression.

import io
from fastavro import writer, reader, parse_schema

# 1. Define an Avro schema
schema = {
    'doc': 'A simple user record.',
    'name': 'User',
    'namespace': 'example.avro',
    'type': 'record',
    'fields': [
        {'name': 'name', 'type': 'string'},
        {'name': 'favorite_number', 'type': ['int', 'null'], 'default': None},
        {'name': 'favorite_color', 'type': ['string', 'null'], 'default': 'green'}
    ]
}

# It's optional but recommended to parse the schema once for performance
parsed_schema = parse_schema(schema)

# 2. Prepare some records
records = [
    {'name': 'Alice', 'favorite_number': 256, 'favorite_color': 'blue'},
    {'name': 'Bob', 'favorite_number': 7, 'favorite_color': None},
    {'name': 'Charlie', 'favorite_number': None, 'favorite_color': 'red'}
]

# 3. Write records to an in-memory Avro file (BytesIO)
bytes_writer = io.BytesIO()
writer(bytes_writer, parsed_schema, records, codec='deflate')

# 4. Read records back from the in-memory Avro file
bytes_writer.seek(0) # Rewind the buffer to the beginning
avro_reader = reader(bytes_writer)

read_records = []
for record in avro_reader:
    read_records.append(record)

print("Original Records:", records)
print("Read Records:", read_records)

# Verify that read records match original records
assert records == read_records
print("Successfully wrote and read Avro records!")

view raw JSON →