parquet-tools

raw JSON →
0.2.16 verified Mon Apr 27 auth: no python

A command-line tool and Python library for inspecting, merging, and converting Parquet files. Version 0.2.16 requires Python 3.9+, maintains quarterly releases.

pip install parquet-tools
error ModuleNotFoundError: No module named 'parquet_tools'
cause Package not installed, or misspelled module name (e.g., 'parquet-tools' instead of 'parquet_tools').
fix
Install with 'pip install parquet-tools' and import with 'from parquet_tools import ...'.
error AttributeError: module 'parquet_tools' has no attribute 'inspect'
cause Incorrect import path; the module name is 'parquet_tools', but the symbol 'inspect' is a function at top-level.
fix
Use 'from parquet_tools import inspect'.
gotcha The package name on PyPI is 'parquet-tools' (with hyphen), but the Python module uses underscore: 'parquet_tools'. Import with 'from parquet_tools import ...' not 'from parquet-tools import ...'.
fix Use underscore in import statements.
gotcha The 'inspect' function prints schema to stdout rather than returning a Python object. You cannot programmatically access the schema from its return value.
fix Capture stdout (e.g., with contextlib.redirect_stdout) or use pyarrow.parquet.read_schema directly.
gotcha The 'merge' function may change row ordering. It does not guarantee preservation of original row groups or sort order.
fix Sort data beforehand if order matters, or use alternative methods like pyarrow.parquet.write_table.
gotcha The CLI tool is installed as 'parquet-tools' (with hyphen). Running 'parquet-tools --help' works, but 'parquet_tools' as a command does not.
fix Use hyphen in command name: 'parquet-tools'.

Quick inspection of a Parquet file's schema.

from parquet_tools import inspect
import pyarrow.parquet as pq

# Create a small Parquet file for inspection
import pandas as pd
df = pd.DataFrame({'x': [1, 2], 'y': ['a', 'b']})
pq.write_table(pyarrow.Table.from_pandas(df), 'test.parquet')

# Inspect schema
inspect('test.parquet')