{"library":"avro","title":"Apache Avro Python","description":"Avro is a data serialization and RPC framework for various languages, including Python. It uses JSON for defining data types and protocols and serializes data in a compact binary format. The Python library provides tools for schema parsing, binary encoding/decoding, and working with Avro Data Files. The current version is 1.12.1, with releases typically occurring a few times a year for minor or patch updates.","status":"active","version":"1.12.1","language":"en","source_language":"en","source_url":"https://github.com/apache/avro","tags":["serialization","rpc","data-format","schema","apache"],"install":[{"cmd":"pip install avro","lang":"bash","label":"Install stable release"},{"cmd":"pip install avro[snappy,zstandard]","lang":"bash","label":"Install with optional compression codecs"}],"dependencies":[{"reason":"Optional Snappy compression support","package":"python-snappy","optional":true},{"reason":"Optional Zstandard compression support","package":"python-zstandard","optional":true}],"imports":[{"note":"Prior to Avro 1.10.x, or if using the now deprecated 'avro-python3' package, 'Parse' (capital 'P') was used. For the current 'avro' package (Python 3 compatible), use 'parse' (lowercase 'p').","wrong":"schema = avro.schema.Parse(json_schema_string)","symbol":"avro.schema.parse","correct":"import avro.schema\nschema = avro.schema.parse(json_schema_string)"},{"symbol":"DataFileReader","correct":"from avro.datafile import DataFileReader"},{"symbol":"DataFileWriter","correct":"from avro.datafile import DataFileWriter"},{"symbol":"DatumReader","correct":"from avro.io import DatumReader"},{"symbol":"DatumWriter","correct":"from avro.io import DatumWriter"}],"quickstart":{"code":"import avro.schema\nfrom avro.datafile import DataFileReader, DataFileWriter\nfrom avro.io import DatumReader, DatumWriter\nimport io\n\n# Define schema\nschema_str = '''\n{\n    \"type\": \"record\",\n    \"name\": \"User\",\n    \"fields\": [\n        {\"name\": \"name\", \"type\": \"string\"},\n        {\"name\": \"favorite_number\", \"type\": [\"int\", \"null\"]},\n        {\"name\": \"favorite_color\", \"type\": [\"string\", \"null\"]}\n    ]\n}\n'''\nschema = avro.schema.parse(schema_str)\n\n# Prepare data\nusers = [\n    {\"name\": \"Alyssa\", \"favorite_number\": 256, \"favorite_color\": \"red\"},\n    {\"name\": \"Ben\", \"favorite_number\": 7, \"favorite_color\": \"blue\"},\n    {\"name\": \"Charlie\", \"favorite_number\": null, \"favorite_color\": \"green\"},\n    {\"name\": \"David\", \"favorite_number\": 42, \"favorite_color\": null}\n]\n\n# Write data to an in-memory Avro file\n# Using io.BytesIO for an in-memory file-like object\noutput_stream = io.BytesIO()\nwriter = DataFileWriter(output_stream, DatumWriter(), schema)\nfor user in users:\n    writer.append(user)\nwriter.close()\n\n# Reset stream position to read from the beginning\noutput_stream.seek(0)\n\n# Read data from the in-memory Avro file\nreader = DataFileReader(output_stream, DatumReader())\nprint(\"Reading Avro data:\")\nfor user in reader:\n    print(user)\nreader.close()\n\noutput_stream.close()","lang":"python","description":"This quickstart demonstrates how to define an Avro schema, serialize Python dictionaries (records) into an Avro data file (here, in-memory using `io.BytesIO`), and then deserialize them back into Python dictionaries. It uses `avro.schema.parse` to load the schema, `DataFileWriter` and `DatumWriter` to write, and `DataFileReader` and `DatumReader` to read."},"warnings":[{"fix":"Ensure you are installing `avro` (i.e., `pip install avro`). If migrating from `avro-python3`, be aware of minor API differences, such as function capitalization (e.g., `avro.schema.parse` vs `avro.schema.Parse`).","message":"The `avro-python3` PyPI package is deprecated. Users should now install and use the `avro` package, which supports both Python 2 (legacy) and Python 3. The `avro-python3` package will be removed in the near future.","severity":"deprecated","affected_versions":"<= 1.10.x of `avro-python3`, all versions of `avro`."},{"fix":"Always use a Python 3 environment (>=3.9 as per package requirements) and ensure `pip install avro` is performed. The `avro` package has consolidated Python 3 support.","message":"Installing the `avro` package (intended for Python 3+) in older Python 2 environments or incorrectly expecting Python 2 behavior in Python 3 can lead to `SyntaxError` due to incompatible syntax (e.g., `except Exception, e:`).","severity":"gotcha","affected_versions":"Python 2.x environments attempting to use Python 3+ compatible `avro` library. Python 3 environments if `avro` (Python 2 intended) was mistakenly installed before version unification."},{"fix":"For performance-critical applications, consider using alternative libraries like `fastavro` (available on PyPI: `pip install fastavro`), which uses C extensions for significantly improved speed. `fastavro` provides a similar API but may not support Avro RPC.","message":"The official Python Avro library is implemented in pure Python, which can lead to slow performance when processing large volumes of data or complex schemas. This is a common pain point for users.","severity":"gotcha","affected_versions":"All versions of the official `avro` library."},{"fix":"Carefully manage schema evolution. Ensure that reader schemas are forward-compatible with writer schemas. For robust applications, always validate your data against the expected schema and handle potential schema resolution errors.","message":"When reading Avro files, the reader's schema must be compatible with the writer's schema, adhering to Avro's schema evolution rules. Mismatched or missing fields (especially required ones) between reader and writer schemas can lead to errors or unexpected data during deserialization.","severity":"gotcha","affected_versions":"All versions."}],"env_vars":null,"last_verified":"2026-04-05T00:00:00.000Z","next_check":"2026-07-04T00:00:00.000Z"}