Parquet

1.3.1 · abandoned · verified Wed Apr 15

The `parquet` library (parquet-python) is a pure-Python implementation for working with the Apache Parquet file format. As of its last update (version 1.3.1), it primarily offers read-only support for Parquet files, allowing users to extract data as JSON or TSV. The project explicitly states that performance has not been optimized and many features, including writing, are not implemented. Development appears to have ceased in 2017 on GitHub and the last PyPI upload was in 2020, indicating it is an unmaintained project.

Warnings

Install

Imports

Quickstart

The quickstart demonstrates reading a Parquet file using the `DictReader` to get rows as dictionaries, and `reader` to get rows as lists. It's crucial to note that this library is strictly read-only; you cannot create Parquet files with it. For testing, you must provide an existing Parquet file.

import parquet
import json
import os

# Create a dummy Parquet file for demonstration
# This library only supports reading, so we'll simulate a file.
# In a real scenario, you'd have an existing .parquet file.

# For demonstration, we'll write a simple text file
# and ask the user to manually create a test.parquet file
# since this library does not support writing.
# You would replace 'test.parquet' with your actual file.

print("This library is read-only. Please ensure 'test.parquet' exists.")
print("Example content (replace with actual Parquet data):")
print("## foo bar baz\n## 1 2 3\n## 4 5 6")

# Assuming a 'test.parquet' file exists with data:
# {'foo': 1, 'bar': 2, 'baz': 3}
# {'foo': 4, 'bar': 5, 'baz': 6}

try:
    with open("test.parquet", "rb") as fo:
        print("\nReading 'test.parquet' with DictReader (columns 'foo', 'bar'):")
        for row in parquet.DictReader(fo, columns=['foo', 'bar']):
            print(json.dumps(row))

    with open("test.parquet", "rb") as fo:
        print("\nReading 'test.parquet' with reader (columns 'foo', 'bar'):")
        for row in parquet.reader(fo, columns=['foo', 'bar']):
            print(",".join([str(r) for r in row]))
except FileNotFoundError:
    print("Error: 'test.parquet' not found. Please create one for testing.")
except Exception as e:
    print(f"An error occurred: {e}")

view raw JSON →