{"id":8398,"library":"pandavro","title":"Pandavro","description":"Pandavro provides a convenient interface to read and write Avro files using pandas DataFrames. It simplifies the serialization and deserialization of tabular data between Python's pandas library and the Avro data format. The current version is 1.9.0, and it maintains an active release schedule with updates for Python, pandas, and NumPy compatibility.","status":"active","version":"1.9.0","language":"en","source_language":"en","source_url":"https://github.com/ynqa/pandavro","tags":["avro","pandas","dataframe","data-serialization","data-exchange"],"install":[{"cmd":"pip install pandavro","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Core dependency for DataFrame operations.","package":"pandas"},{"reason":"Backend Avro serialization/deserialization library.","package":"fastavro"}],"imports":[{"symbol":"to_avro","correct":"import pandavro as pa\npa.to_avro(...)"},{"symbol":"read_avro","correct":"import pandavro as pa\npa.read_avro(...)"}],"quickstart":{"code":"import pandas as pd\nimport pandavro as pa\nimport io\n\n# 1. Create a pandas DataFrame\ndf = pd.DataFrame({\n    'id': [1, 2, 3],\n    'name': ['Alice', 'Bob', 'Charlie'],\n    'value': [10.1, 20.2, 30.3]\n})\n\nprint(\"Original DataFrame:\")\nprint(df)\n\n# 2. Write DataFrame to an Avro file (using BytesIO for in-memory example)\noutput_buffer = io.BytesIO()\npa.to_avro(output_buffer, df, name=\"my_record\") # 'name' is recommended for the root record\n\n# 3. Read Avro data back into a DataFrame\noutput_buffer.seek(0) # Reset buffer position for reading\nread_df = pa.read_avro(output_buffer)\n\nprint(\"\\nRead DataFrame from Avro:\")\nprint(read_df)\n\n# You can also use file paths directly:\n# pa.to_avro('output.avro', df)\n# loaded_df = pa.read_avro('output.avro')","lang":"python","description":"This quickstart demonstrates how to create a pandas DataFrame, write it to an Avro in-memory stream using `pandavro.to_avro()`, and then read it back into a new DataFrame with `pandavro.read_avro()`."},"warnings":[{"fix":"Review your application's reliance on pandas/NumPy internals and behavior when upgrading to pandas 2.0+ or NumPy 2.0+. Test thoroughly after upgrading underlying dependencies.","message":"Version 1.9.0 introduces official support for pandas 2.0 and NumPy 2.0. While `pandavro` itself is adapted, upgrading these underlying libraries in your environment might introduce breaking changes in your own code, especially regarding pandas' copy-on-write behavior or NumPy's API changes.","severity":"breaking","affected_versions":">=1.9.0"},{"fix":"Inspect the inferred schema if schema compatibility is critical (e.g., by writing to a file and examining it with an Avro tool). Pre-process your DataFrame to ensure consistent dtypes for columns before writing, or explicitly cast types to match your desired Avro schema.","message":"`pandavro` infers Avro schemas from pandas DataFrames. This inference might not perfectly align with pre-existing Avro schemas or desired Avro types, particularly for mixed-type columns, generic `object` dtypes, or specific handling of `NaN`/`None` values, leading to unexpected schemas or data type conversions.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Standardize `NaN` and `None` handling in your DataFrame. Use `df.fillna()` or explicit type conversions (e.g., `pd.Int64Dtype()`) to ensure consistent nullable types. For `object` columns, ensure all non-null values are of a consistent type that maps well to Avro (e.g., all strings).","message":"Handling of `NaN` (Not a Number) and `None` values can lead to subtle issues. `pandavro` typically maps `NaN` in numeric columns to `null` within an Avro union type (e.g., `[\"null\", \"double\"]`). However, `None` in `object` columns might result in `string` or `bytes` types depending on other data, potentially causing schema mismatches.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure DataFrame columns have consistent dtypes. Handle `NaN`/`None` values explicitly (e.g., `df.fillna(value)` or converting to nullable pandas dtypes like `pd.Int64Dtype()`) or pre-process `object` columns to ensure all values are of a consistent, Avro-compatible type.","cause":"Data in a DataFrame column does not conform to the Avro schema inferred by `pandavro` or to an expected schema. This often happens due to mixed types in a column or unexpected `NaN`/`None` values.","error":"fastavro.validation.ValidationError: The datum ... is not of type ..."},{"fix":"Ensure the second argument to `pandavro.to_avro()` is always a `pandas.DataFrame` object. Convert dictionaries or Series to DataFrames first (e.g., `pd.DataFrame(your_dict)` or `your_series.to_frame()`).","cause":"Attempting to pass a Python dictionary, a pandas Series, or another non-DataFrame object directly to `pandavro.to_avro()`, which expects a `pandas.DataFrame` as its second argument.","error":"AttributeError: 'dict' object has no attribute 'items'"},{"fix":"Pre-process the DataFrame to flatten complex structures or convert custom objects into basic, Avro-compatible types such as strings (e.g., by serializing them to JSON strings), integers, or floats before calling `pandavro.to_avro()`.","cause":"A column in the DataFrame contains complex Python objects (e.g., custom classes, unflattened nested lists/dicts, or non-standard types) that `pandavro` cannot automatically map to a standard Avro type.","error":"ValueError: Cannot convert object of type <class 'some_complex_type'> to Avro type"}]}