pgzip

0.4.0 · active · verified Thu Apr 16

pgzip is a Python library that provides a multi-threading implementation of the standard `gzip` module. It aims to be a drop-in replacement, offering significant performance improvements for compression and decompression of large files by leveraging parallel processing. It achieves this by utilizing block indexing within the gzip file's `FEXTRA` field, ensuring compatibility with standard gzip tools. The library is actively maintained, with a recent major release (0.4.0) indicating ongoing development and support for newer Python versions.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to use `pgzip.open` for compressing and decompressing data to/from a file, as well as `pgzip.compress` and `pgzip.decompress` for in-memory byte manipulation. It utilizes `tempfile` for a runnable, clean example. Note the use of `thread` and `blocksize` parameters for parallelization.

import pgzip
import os
import tempfile

# Create some dummy data
original_data = b"This is a test string that will be compressed using pgzip. " * 1000

with tempfile.TemporaryDirectory() as tmpdir:
    filepath_gz = os.path.join(tmpdir, "test_data.txt.gz")

    # 1. Compress data to a file using 4 threads and 1MB blocks
    print(f"Compressing data to {filepath_gz}...")
    with pgzip.open(filepath_gz, "wb", thread=4, blocksize=2**20) as f_out:
        f_out.write(original_data)
    print(f"Compressed file size: {os.path.getsize(filepath_gz)} bytes")

    # 2. Decompress data from the file using 4 threads
    print(f"Decompressing data from {filepath_gz}...")
    with pgzip.open(filepath_gz, "rb", thread=4) as f_in:
        decompressed_data_file = f_in.read()

    assert original_data == decompressed_data_file
    print("File compression and decompression successful!")

# 3. In-memory compression and decompression using default threads
print("\nPerforming in-memory compression/decompression...")
compressed_bytes = pgzip.compress(original_data, compresslevel=6)
decompressed_bytes = pgzip.decompress(compressed_bytes)

assert original_data == decompressed_bytes
print(f"In-memory compression/decompression successful! Compressed size: {len(compressed_bytes)} bytes")

view raw JSON →