Parquet Metadata Tool
parquet-metadata is a Python-based command-line tool, version 0.0.1, designed to display metadata about a Parquet file. It provides insights into the file's structure, columns, row groups, and basic statistics. The project appears to be unmaintained since its last release in 2018, making it primarily a historical reference rather than a actively developed library for programmatic use.
Common errors
-
ModuleNotFoundError: No module named 'parquet_metadata'
cause Attempting to import `parquet_metadata` directly as a Python library module, which it is not designed to be.fixThis package is a command-line tool. Instead of importing, use it from your terminal: `parquet-metadata your_file.parquet`. If you must run it from Python, use `subprocess.run(['parquet-metadata', 'your_file.parquet'])`. -
parquet-metadata: command not found
cause The `parquet-metadata` executable is not in your system's PATH, usually due to an incomplete `pip install` or an environment issue.fixEnsure `pip install parquet-metadata` completed successfully. Verify your `PATH` environment variable includes the directory where pip installs scripts (e.g., `~/.local/bin` or a virtual environment's `bin`/`Scripts` directory). Reinstalling in a fresh virtual environment is often helpful.
Warnings
- deprecated The `parquet-metadata` package (version 0.0.1) was last updated in 2018 and is no longer actively maintained. It lacks modern features and bug fixes found in newer Parquet introspection tools.
- gotcha This package is a command-line utility and does not provide a stable Python API for programmatic interaction. Direct imports from its internal scripts are not recommended or supported.
Install
-
pip install parquet-metadata
Imports
- NotApplicable
from parquet_metadata import some_function
This package is primarily a command-line tool.
Quickstart
import subprocess
import os
# Create a dummy Parquet file for demonstration (requires pyarrow)
# In a real scenario, you would use an existing Parquet file.
try:
import pyarrow as pa
import pyarrow.parquet as pq
table = pa.table({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})
dummy_file = 'dummy.parquet'
pq.write_table(table, dummy_file)
print(f"Created dummy Parquet file: {dummy_file}")
# Run the parquet-metadata command-line tool
command = ['parquet-metadata', dummy_file]
result = subprocess.run(command, capture_output=True, text=True, check=True)
print("\n--- Parquet Metadata Output ---")
print(result.stdout)
if result.stderr:
print("--- Errors ---")
print(result.stderr)
# Clean up dummy file
os.remove(dummy_file)
print(f"Removed dummy Parquet file: {dummy_file}")
except ImportError:
print("Pyarrow not installed. Cannot create dummy parquet file for quickstart.")
print("To run quickstart, install pyarrow: pip install pyarrow")
print("You can still try 'parquet-metadata your_file.parquet' in your terminal.")
except Exception as e:
print(f"An error occurred: {e}")