Pandavro

1.9.0 · active · verified Thu Apr 16

Pandavro provides a convenient interface to read and write Avro files using pandas DataFrames. It simplifies the serialization and deserialization of tabular data between Python's pandas library and the Avro data format. The current version is 1.9.0, and it maintains an active release schedule with updates for Python, pandas, and NumPy compatibility.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to create a pandas DataFrame, write it to an Avro in-memory stream using `pandavro.to_avro()`, and then read it back into a new DataFrame with `pandavro.read_avro()`.

import pandas as pd
import pandavro as pa
import io

# 1. Create a pandas DataFrame
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'value': [10.1, 20.2, 30.3]
})

print("Original DataFrame:")
print(df)

# 2. Write DataFrame to an Avro file (using BytesIO for in-memory example)
output_buffer = io.BytesIO()
pa.to_avro(output_buffer, df, name="my_record") # 'name' is recommended for the root record

# 3. Read Avro data back into a DataFrame
output_buffer.seek(0) # Reset buffer position for reading
read_df = pa.read_avro(output_buffer)

print("\nRead DataFrame from Avro:")
print(read_df)

# You can also use file paths directly:
# pa.to_avro('output.avro', df)
# loaded_df = pa.read_avro('output.avro')

view raw JSON →