{"id":7003,"library":"arro3-io","title":"arro3-io","description":"arro3-io is a Python library that provides streaming-capable readers and writers for various Apache Arrow-compatible data formats, including Parquet, Arrow IPC, JSON, and CSV. It is an integral part of the `arro3` ecosystem, which aims to be a minimal Python interface to Apache Arrow's Rust implementation, offering a more lightweight alternative to PyArrow. The library emphasizes a streaming-first approach, enabling efficient processing of larger-than-memory datasets through lazy iterators. It is actively maintained, with the current version being 0.8.0, and integrates seamlessly with other Python data libraries that implement the Arrow PyCapsule Interface.","status":"active","version":"0.8.0","language":"en","source_language":"en","source_url":"https://github.com/kylebarron/arro3","tags":["apache arrow","data processing","parquet","ipc","csv","json","streaming","rust","pyo3"],"install":[{"cmd":"pip install arro3-io","lang":"bash","label":"Install only arro3-io"},{"cmd":"pip install arro3-core arro3-io arro3-compute","lang":"bash","label":"Install full arro3 ecosystem"}],"dependencies":[{"reason":"Provides core Arrow data structures (Table, RecordBatch) which arro3-io operates on; part of the same namespace package.","package":"arro3-core","optional":false},{"reason":"Often used for creating/consuming Arrow data compatible with arro3-io, though not strictly required as arro3-io can interoperate with any Arrow PyCapsule-compliant library.","package":"pyarrow","optional":true},{"reason":"Commonly used for data manipulation and converting to/from Arrow formats for use with arro3-io.","package":"pandas","optional":true}],"imports":[{"symbol":"read_parquet","correct":"from arro3.io import read_parquet"},{"symbol":"write_parquet","correct":"from arro3.io import write_parquet"},{"symbol":"read_ipc","correct":"from arro3.io import read_ipc"},{"symbol":"write_ipc","correct":"from arro3.io import write_ipc"},{"note":"`Table` is a core data structure provided by `arro3-core`, not directly by `arro3-io`.","wrong":"from arro3.io import Table","symbol":"Table","correct":"from arro3.core import Table"}],"quickstart":{"code":"import arro3.io\nimport arro3.core\nimport pyarrow as pa\nimport pandas as pd\nimport io\n\n# 1. Create some dummy data using pandas and pyarrow\ndf = pd.DataFrame({\"col1\": [1, 2, 3], \"col2\": [\"A\", \"B\", \"C\"]})\npa_table = pa.Table.from_pandas(df)\n\n# 2. Write the data to an in-memory buffer as a Parquet file using arro3.io\nbuffer = io.BytesIO()\narro3.io.write_parquet(pa_table, buffer)\nbuffer.seek(0)\n\n# 3. Read the Parquet data back from the buffer using arro3.io\n# arro3.io.read_parquet returns a RecordBatchReader (an iterator)\nreader = arro3.io.read_parquet(buffer)\n\n# 4. Materialize the streaming RecordBatchReader into an arro3.core.Table\narro3_table = arro3.core.Table(reader)\n\nprint(\"Original Pandas DataFrame:\")\nprint(df)\nprint(\"\\narro3 Table read back:\")\nprint(arro3_table)\n\n# 5. Demonstrate interoperability by converting the arro3.core.Table back to PyArrow and Pandas\nprint(\"\\narro3 Table converted to PyArrow Table:\")\nprint(arro3_table.to_pyarrow())\nprint(\"\\narro3 Table converted to Pandas DataFrame:\")\nprint(arro3_table.to_pandas())","lang":"python","description":"This quickstart demonstrates how to use `arro3-io` to write and read Apache Arrow-compatible data. It showcases the creation of data using Pandas and PyArrow, writing it to an in-memory buffer using `arro3.io.write_parquet`, then reading it back with `arro3.io.read_parquet`. The streaming `RecordBatchReader` is then materialized into an `arro3.core.Table`, and finally converted back to PyArrow and Pandas to highlight interoperability."},"warnings":[{"fix":"Review code that relies on explicit nullability assumptions when interacting with `DataType` objects passed between `arro3` and `pyarrow`. Ensure your code handles `nullable: true` for bare `DataType` objects where applicable.","message":"In version 0.8.0, the serialization of a bare `DataType` through `__arrow_c_schema__` (e.g., when passing to `pyarrow.field`) now explicitly sets `nullable: true` to match PyArrow's equality semantics.","severity":"breaking","affected_versions":"0.7.x to 0.8.0"},{"fix":"Ensure you install `arro3-core` alongside `arro3-io` (`pip install arro3-core arro3-io`) and import core data structures from `arro3.core` (e.g., `from arro3.core import Table`).","message":"The `arro3` project is distributed as modular namespace packages (`arro3-core`, `arro3-io`, `arro3-compute`). While `arro3-io` handles I/O, core Arrow data structures like `Table` or `RecordBatch` are provided by `arro3-core`. Users often need to install and import from `arro3-core` for full functionality.","severity":"gotcha","affected_versions":"All versions"},{"fix":"To materialize the data, call `.read_all()` on the `RecordBatchReader` or pass the reader directly to `arro3.core.Table()` (e.g., `table = arro3.core.Table(reader)`).","message":"`arro3.io`'s read functions (e.g., `read_parquet`) return a `RecordBatchReader`, which is a lazy iterator. If you need to work with the entire dataset in memory, you must explicitly materialize it.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Change your import statement from `import arrow3.io` to `import arro3.io`. Ensure you have installed the correct package: `pip install arro3-io`.","cause":"The package name is `arro3-io` (with two 'r's) and its modules are under the `arro3` namespace, not `arrow3`.","error":"ModuleNotFoundError: No module named 'arrow3.io'"},{"fix":"Import `Table` from `arro3.core`: `from arro3.core import Table`. You may also need to install `arro3-core` if you haven't already: `pip install arro3-core`.","cause":"The `Table` class, a fundamental Arrow data structure, is provided by the `arro3-core` package, not `arro3-io`.","error":"AttributeError: module 'arro3.io' has no attribute 'Table'"},{"fix":"Materialize the `RecordBatchReader` into a `Table` or iterate over it. For example, `table = arro3.core.Table(reader)` or `for batch in reader: ...`.","cause":"You are attempting to access a `RecordBatchReader` (which is an iterator) like a list or array before materializing its contents.","error":"TypeError: 'RecordBatchReader' object is not subscriptable"}]}