parquet-tools
raw JSON → 0.2.16 verified Mon Apr 27 auth: no python
A command-line tool and Python library for inspecting, merging, and converting Parquet files. Version 0.2.16 requires Python 3.9+, maintains quarterly releases.
pip install parquet-tools Common errors
error ModuleNotFoundError: No module named 'parquet_tools' ↓
cause Package not installed, or misspelled module name (e.g., 'parquet-tools' instead of 'parquet_tools').
fix
Install with 'pip install parquet-tools' and import with 'from parquet_tools import ...'.
error AttributeError: module 'parquet_tools' has no attribute 'inspect' ↓
cause Incorrect import path; the module name is 'parquet_tools', but the symbol 'inspect' is a function at top-level.
fix
Use 'from parquet_tools import inspect'.
Warnings
gotcha The package name on PyPI is 'parquet-tools' (with hyphen), but the Python module uses underscore: 'parquet_tools'. Import with 'from parquet_tools import ...' not 'from parquet-tools import ...'. ↓
fix Use underscore in import statements.
gotcha The 'inspect' function prints schema to stdout rather than returning a Python object. You cannot programmatically access the schema from its return value. ↓
fix Capture stdout (e.g., with contextlib.redirect_stdout) or use pyarrow.parquet.read_schema directly.
gotcha The 'merge' function may change row ordering. It does not guarantee preservation of original row groups or sort order. ↓
fix Sort data beforehand if order matters, or use alternative methods like pyarrow.parquet.write_table.
gotcha The CLI tool is installed as 'parquet-tools' (with hyphen). Running 'parquet-tools --help' works, but 'parquet_tools' as a command does not. ↓
fix Use hyphen in command name: 'parquet-tools'.
Imports
- inspect wrong
from parquet-tools import inspectcorrectfrom parquet_tools import inspect - merge
from parquet_tools import merge - show
from parquet_tools import show
Quickstart
from parquet_tools import inspect
import pyarrow.parquet as pq
# Create a small Parquet file for inspection
import pandas as pd
df = pd.DataFrame({'x': [1, 2], 'y': ['a', 'b']})
pq.write_table(pyarrow.Table.from_pandas(df), 'test.parquet')
# Inspect schema
inspect('test.parquet')