PyArrow
raw JSON → 23.0.1 verified Tue May 12 auth: no python install: verified quickstart: verified
PyArrow is a Python library for Apache Arrow, providing a standardized language-independent columnar memory format for flat and hierarchical data. The current version is 23.0.1, released on March 28, 2026, with a regular release cadence.
pip install pyarrow Common errors
error ModuleNotFoundError: No module named 'pyarrow._dataset' ↓
cause This error occurs when the 'pyarrow.dataset' module is not available, often due to an incomplete or incorrect installation of PyArrow.
fix
Ensure that PyArrow is installed correctly by running 'pip install pyarrow' and verify the installation.
error ValueError: The pyarrow library is not installed, please install pyarrow to use the to_arrow() function. ↓
cause This error indicates that the PyArrow library is not installed, which is required for certain functions like 'to_arrow()'.
fix
Install PyArrow using 'pip install pyarrow'.
error ImportError: libaws-cpp-sdk-s3.so: cannot open shared object file: No such file or directory ↓
cause This error occurs when the required AWS SDK for C++ shared library is missing, leading to import failures in PyArrow.
fix
Install the missing AWS SDK for C++ shared library or ensure that all necessary dependencies are installed.
error ImportError: DLL load failed while importing lib: The specified procedure could not be found. ↓
cause This error indicates that a required DLL is missing or incompatible, preventing PyArrow from being imported.
fix
Ensure that all necessary DLLs are present and compatible with your system; reinstalling PyArrow may resolve the issue.
error AttributeError: module 'pyarrow' has no attribute '__version__' ↓
cause This error occurs when the '__version__' attribute is missing from the PyArrow module, possibly due to an incomplete installation.
fix
Upgrade or reinstall PyArrow using 'pip install --upgrade pyarrow'.
Warnings
breaking PyArrow 23.0.1 introduces changes to the `pyarrow.fs` module, affecting file system operations. Review the release notes for migration details. ↓
fix Update your code to align with the new `pyarrow.fs` API as detailed in the release notes.
deprecated The `pyarrow.parquet` module's `ParquetFile` class is deprecated as of version 23.0.1. Use `ParquetDataset` instead. ↓
fix Replace instances of `ParquetFile` with `ParquetDataset` in your codebase.
Install compatibility verified last tested: 2026-05-12
python os / libc status wheel install import disk
3.10 alpine (musl) - - 0.07s 187.4M
3.10 slim (glibc) - - 0.04s 165M
3.11 alpine (musl) - - 0.10s 191.4M
3.11 slim (glibc) - - 0.08s 169M
3.12 alpine (musl) - - 0.08s 183.0M
3.12 slim (glibc) - - 0.10s 160M
3.13 alpine (musl) - - 0.07s 182.7M
3.13 slim (glibc) - - 0.07s 160M
3.9 alpine (musl) - - 0.06s 173.4M
3.9 slim (glibc) - - 0.06s 156M
Imports
- pa
import pyarrow as pa
Quickstart verified last tested: 2026-04-23
import pyarrow as pa
# Create a simple Arrow Table
data = {'column1': [1, 2, 3], 'column2': ['A', 'B', 'C']}
table = pa.table(data)
# Write the Table to a Parquet file
import pyarrow.parquet as pq
pq.write_table(table, 'example.parquet')
# Read the Table back from the Parquet file
table_read = pq.read_table('example.parquet')
print(table_read)