Parquet Metadata Tool

0.0.1 · abandoned · verified Thu Apr 16

parquet-metadata is a Python-based command-line tool, version 0.0.1, designed to display metadata about a Parquet file. It provides insights into the file's structure, columns, row groups, and basic statistics. The project appears to be unmaintained since its last release in 2018, making it primarily a historical reference rather than a actively developed library for programmatic use.

Common errors

Warnings

Install

Imports

Quickstart

The `parquet-metadata` package is intended for command-line use. This quickstart demonstrates how to invoke the `parquet-metadata` CLI tool from Python using `subprocess` to inspect a Parquet file's metadata. It first creates a dummy Parquet file for testing purposes.

import subprocess
import os

# Create a dummy Parquet file for demonstration (requires pyarrow)
# In a real scenario, you would use an existing Parquet file.
try:
    import pyarrow as pa
    import pyarrow.parquet as pq
    table = pa.table({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})
    dummy_file = 'dummy.parquet'
    pq.write_table(table, dummy_file)
    print(f"Created dummy Parquet file: {dummy_file}")

    # Run the parquet-metadata command-line tool
    command = ['parquet-metadata', dummy_file]
    result = subprocess.run(command, capture_output=True, text=True, check=True)
    print("\n--- Parquet Metadata Output ---")
    print(result.stdout)
    if result.stderr:
        print("--- Errors ---")
        print(result.stderr)

    # Clean up dummy file
    os.remove(dummy_file)
    print(f"Removed dummy Parquet file: {dummy_file}")

except ImportError:
    print("Pyarrow not installed. Cannot create dummy parquet file for quickstart.")
    print("To run quickstart, install pyarrow: pip install pyarrow")
    print("You can still try 'parquet-metadata your_file.parquet' in your terminal.")
except Exception as e:
    print(f"An error occurred: {e}")

view raw JSON →