fastparquet
fastparquet is a Python library providing performant read/write support for the Parquet file format, without needing a Python-Java bridge. It integrates well with Python-based big data workflows, particularly Dask and Pandas (versions < 3.0). As of March 2026, with Pandas 3.0 explicitly depending on PyArrow, `fastparquet` is being retired, and no further development is anticipated, though it remains usable for Pandas 2.x users.
Common errors
-
AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement'
cause This error typically occurs due to version incompatibility between `fastparquet`, `pyarrow`, and `pandas`, especially when `fastparquet` version 0.8.0 or newer is used with `pyarrow` versions older than 5.0.0 and specific Python/Pandas versions (e.g., Python 3.6.9, Pandas 1.1.5).fixDowngrade `fastparquet` to an older, compatible version like 0.7.2. Ensure `pyarrow` is also at a compatible version (e.g., `pyarrow==5.0.0` with `fastparquet==0.7.2`). `pip install fastparquet==0.7.2 pyarrow==5.0.0` -
ModuleNotFoundError: No module named 'fastparquet'
cause This error means the `fastparquet` library is not installed in the Python environment where the code is being run, or the environment is not correctly activated.fixInstall `fastparquet` using pip or conda, making sure the installation targets the correct Python environment. `pip install fastparquet` (or `conda install -c conda-forge fastparquet` if using Anaconda) -
RuntimeError: Compression 'snappy' not available. Options: ['GZIP', 'UNCOMPRESSED']
cause This error indicates that a specified compression library (e.g., `snappy`, `lz4`, `zstandard`, `brotli`) is not installed or properly configured in the environment, even though `fastparquet` supports it.fixInstall the missing compression library. For 'snappy', install `python-snappy`. For others, install the corresponding Python package (e.g., `lz4`, `zstandard`, `brotli`). `pip install python-snappy` (or `conda install -c conda-forge python-snappy`) -
ValueError: Can't infer object conversion type: 0 (6.0, 1.0, 1.0, 1.0, 1.0)
cause This error arises when `fastparquet` encounters a column in a Pandas DataFrame that contains complex or mixed data types (like lists, tuples, or objects that it cannot automatically convert to a Parquet-compatible type).fixExplicitly convert the problematic column(s) to a string type (e.g., `str`) or another simple, consistent type before writing to Parquet, or preprocess the data to ensure uniform, compatible types. `df['problematic_column'] = df['problematic_column'].astype(str)` -
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. A suitable version of pyarrow or fastparquet is required for parquet support.
cause This error from `pandas.read_parquet` indicates that neither `pyarrow` nor `fastparquet` is found or correctly installed in the environment for Pandas to use as a Parquet engine.fixEnsure both `pyarrow` and `fastparquet` are installed and accessible in your Python environment. While `pandas >= 3.0` explicitly depends on `pyarrow`, for `pandas < 3.0` either engine is sufficient. If `fastparquet` is intended, ensure it's installed. `pip install pyarrow fastparquet`
Warnings
- breaking The `fastparquet` project is being retired and is incompatible with `pandas` 3.0 and newer versions. Pandas 3.0 now explicitly depends on `pyarrow`, superseding `fastparquet` for many common workflows. Users should target `pandas<3.0` for continued use or migrate to `pyarrow`.
- gotcha Performance can be significantly impacted by the presence of NULL values and variable-length string encoding in your data. For optimal performance, consider using sentinel values (e.g., NaN) for data types that support them, or fixed-length strings where compatible with your ecosystem.
- gotcha When installing `fastparquet` via `pip`, it's advisable to install `numpy` first to aid the dependency resolver. If pre-compiled wheels are not available for your system/Python version, or when installing directly from the GitHub repository, a C compiler toolchain and `cython` are required for compilation.
- gotcha The test script itself contains a `SyntaxError`, preventing the library from being properly evaluated. This issue is with the test script's Python syntax rather than a direct problem with the installed library or its dependencies.
Install
-
pip install fastparquet -
conda install -c conda-forge fastparquet
Imports
- ParquetFile
from fastparquet import ParquetFile
- write
from fastparquet import write
Quickstart
import pandas as pd
from fastparquet import write, ParquetFile
import os
# Create a sample DataFrame
df = pd.DataFrame({
'col1':,
'col2': ['A', 'B', 'C', 'D'],
'col3': [True, False, True, False]
})
filename = "example.parquet"
# Write the DataFrame to a Parquet file with Snappy compression
write(filename, df, compression='SNAPPY')
print(f"DataFrame successfully written to '{filename}'.")
# Read the Parquet file back into a DataFrame
pf = ParquetFile(filename)
df_read = pf.to_pandas()
print(f"DataFrame successfully read from '{filename}':")
print(df_read)
# Clean up the created file
os.remove(filename)