{"id":8400,"library":"parquet-metadata","title":"Parquet Metadata Tool","description":"parquet-metadata is a Python-based command-line tool, version 0.0.1, designed to display metadata about a Parquet file. It provides insights into the file's structure, columns, row groups, and basic statistics. The project appears to be unmaintained since its last release in 2018, making it primarily a historical reference rather than a actively developed library for programmatic use.","status":"abandoned","version":"0.0.1","language":"en","source_language":"en","source_url":"https://github.com/cldellow/parquet-metadata","tags":["parquet","metadata","cli-tool","data-format","unmaintained"],"install":[{"cmd":"pip install parquet-metadata","lang":"bash","label":"Install via pip"}],"dependencies":[],"imports":[{"note":"The 'parquet-metadata' package (v0.0.1) is not designed for direct programmatic import as a library. Its core functionality is exposed via a command-line script. Attempting to import functions directly from the underlying `parquet_metadata.py` script is an unstable pattern and not officially supported or guaranteed across versions.","wrong":"from parquet_metadata import some_function","symbol":"NotApplicable","correct":"This package is primarily a command-line tool."}],"quickstart":{"code":"import subprocess\nimport os\n\n# Create a dummy Parquet file for demonstration (requires pyarrow)\n# In a real scenario, you would use an existing Parquet file.\ntry:\n    import pyarrow as pa\n    import pyarrow.parquet as pq\n    table = pa.table({'col1': [1, 2, 3], 'col2': ['a', 'b', 'c']})\n    dummy_file = 'dummy.parquet'\n    pq.write_table(table, dummy_file)\n    print(f\"Created dummy Parquet file: {dummy_file}\")\n\n    # Run the parquet-metadata command-line tool\n    command = ['parquet-metadata', dummy_file]\n    result = subprocess.run(command, capture_output=True, text=True, check=True)\n    print(\"\\n--- Parquet Metadata Output ---\")\n    print(result.stdout)\n    if result.stderr:\n        print(\"--- Errors ---\")\n        print(result.stderr)\n\n    # Clean up dummy file\n    os.remove(dummy_file)\n    print(f\"Removed dummy Parquet file: {dummy_file}\")\n\nexcept ImportError:\n    print(\"Pyarrow not installed. Cannot create dummy parquet file for quickstart.\")\n    print(\"To run quickstart, install pyarrow: pip install pyarrow\")\n    print(\"You can still try 'parquet-metadata your_file.parquet' in your terminal.\")\nexcept Exception as e:\n    print(f\"An error occurred: {e}\")","lang":"python","description":"The `parquet-metadata` package is intended for command-line use. This quickstart demonstrates how to invoke the `parquet-metadata` CLI tool from Python using `subprocess` to inspect a Parquet file's metadata. It first creates a dummy Parquet file for testing purposes."},"warnings":[{"fix":"Consider using actively maintained libraries like `parquet-pages` (for programmatic metadata access), `parquet-tools`, or `parq-cli` for up-to-date Parquet file analysis. E.g., `pip install parquet-pages`.","message":"The `parquet-metadata` package (version 0.0.1) was last updated in 2018 and is no longer actively maintained. It lacks modern features and bug fixes found in newer Parquet introspection tools.","severity":"deprecated","affected_versions":"<=0.0.1"},{"fix":"If programmatic access to Parquet metadata is required, use libraries explicitly designed for this purpose, such as `parquet-pages` which exposes Thrift structs for detailed metadata inspection. If you need to run the CLI tool, use `subprocess.run()` as shown in the quickstart.","message":"This package is a command-line utility and does not provide a stable Python API for programmatic interaction. Direct imports from its internal scripts are not recommended or supported.","severity":"gotcha","affected_versions":"<=0.0.1"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"This package is a command-line tool. Instead of importing, use it from your terminal: `parquet-metadata your_file.parquet`. If you must run it from Python, use `subprocess.run(['parquet-metadata', 'your_file.parquet'])`.","cause":"Attempting to import `parquet_metadata` directly as a Python library module, which it is not designed to be.","error":"ModuleNotFoundError: No module named 'parquet_metadata'"},{"fix":"Ensure `pip install parquet-metadata` completed successfully. Verify your `PATH` environment variable includes the directory where pip installs scripts (e.g., `~/.local/bin` or a virtual environment's `bin`/`Scripts` directory). Reinstalling in a fresh virtual environment is often helpful.","cause":"The `parquet-metadata` executable is not in your system's PATH, usually due to an incomplete `pip install` or an environment issue.","error":"parquet-metadata: command not found"}]}