fastparquet
fastparquet is a Python library providing performant read/write support for the Parquet file format, without needing a Python-Java bridge. It integrates well with Python-based big data workflows, particularly Dask and Pandas (versions < 3.0). As of March 2026, with Pandas 3.0 explicitly depending on PyArrow, `fastparquet` is being retired, and no further development is anticipated, though it remains usable for Pandas 2.x users.
Warnings
- breaking The `fastparquet` project is being retired and is incompatible with `pandas` 3.0 and newer versions. Pandas 3.0 now explicitly depends on `pyarrow`, superseding `fastparquet` for many common workflows. Users should target `pandas<3.0` for continued use or migrate to `pyarrow`.
- gotcha Performance can be significantly impacted by the presence of NULL values and variable-length string encoding in your data. For optimal performance, consider using sentinel values (e.g., NaN) for data types that support them, or fixed-length strings where compatible with your ecosystem.
- gotcha When installing `fastparquet` via `pip`, it's advisable to install `numpy` first to aid the dependency resolver. If pre-compiled wheels are not available for your system/Python version, or when installing directly from the GitHub repository, a C compiler toolchain and `cython` are required for compilation.
Install
-
pip install fastparquet -
conda install -c conda-forge fastparquet
Imports
- ParquetFile
from fastparquet import ParquetFile
- write
from fastparquet import write
Quickstart
import pandas as pd
from fastparquet import write, ParquetFile
import os
# Create a sample DataFrame
df = pd.DataFrame({
'col1':,
'col2': ['A', 'B', 'C', 'D'],
'col3': [True, False, True, False]
})
filename = "example.parquet"
# Write the DataFrame to a Parquet file with Snappy compression
write(filename, df, compression='SNAPPY')
print(f"DataFrame successfully written to '{filename}'.")
# Read the Parquet file back into a DataFrame
pf = ParquetFile(filename)
df_read = pf.to_pandas()
print(f"DataFrame successfully read from '{filename}':")
print(df_read)
# Clean up the created file
os.remove(filename)