{"id":8412,"library":"pgzip","title":"pgzip","description":"pgzip is a Python library that provides a multi-threading implementation of the standard `gzip` module. It aims to be a drop-in replacement, offering significant performance improvements for compression and decompression of large files by leveraging parallel processing. It achieves this by utilizing block indexing within the gzip file's `FEXTRA` field, ensuring compatibility with standard gzip tools. The library is actively maintained, with a recent major release (0.4.0) indicating ongoing development and support for newer Python versions.","status":"active","version":"0.4.0","language":"en","source_language":"en","source_url":"https://github.com/pgzip/pgzip","tags":["compression","decompression","gzip","parallel processing","multithreading","performance"],"install":[{"cmd":"pip install pgzip","lang":"bash","label":"Install latest version"}],"dependencies":[],"imports":[{"note":"The library typically replaces direct usage of the standard `gzip` module. Its `open`, `compress`, and `decompress` functions mirror `gzip`'s API.","symbol":"pgzip","correct":"import pgzip"},{"note":"Used for file-like operations, accepting `thread` and `blocksize` parameters for parallelization.","symbol":"open","correct":"with pgzip.open('file.gz', 'wb') as f: ..."},{"note":"Compresses bytes in memory, also accepts `thread` and `compresslevel`.","symbol":"compress","correct":"compressed_data = pgzip.compress(data)"},{"note":"Decompresses bytes in memory, also accepts `thread`.","symbol":"decompress","correct":"decompressed_data = pgzip.decompress(compressed_data)"}],"quickstart":{"code":"import pgzip\nimport os\nimport tempfile\n\n# Create some dummy data\noriginal_data = b\"This is a test string that will be compressed using pgzip. \" * 1000\n\nwith tempfile.TemporaryDirectory() as tmpdir:\n    filepath_gz = os.path.join(tmpdir, \"test_data.txt.gz\")\n\n    # 1. Compress data to a file using 4 threads and 1MB blocks\n    print(f\"Compressing data to {filepath_gz}...\")\n    with pgzip.open(filepath_gz, \"wb\", thread=4, blocksize=2**20) as f_out:\n        f_out.write(original_data)\n    print(f\"Compressed file size: {os.path.getsize(filepath_gz)} bytes\")\n\n    # 2. Decompress data from the file using 4 threads\n    print(f\"Decompressing data from {filepath_gz}...\")\n    with pgzip.open(filepath_gz, \"rb\", thread=4) as f_in:\n        decompressed_data_file = f_in.read()\n\n    assert original_data == decompressed_data_file\n    print(\"File compression and decompression successful!\")\n\n# 3. In-memory compression and decompression using default threads\nprint(\"\\nPerforming in-memory compression/decompression...\")\ncompressed_bytes = pgzip.compress(original_data, compresslevel=6)\ndecompressed_bytes = pgzip.decompress(compressed_bytes)\n\nassert original_data == decompressed_bytes\nprint(f\"In-memory compression/decompression successful! Compressed size: {len(compressed_bytes)} bytes\")\n","lang":"python","description":"This quickstart demonstrates how to use `pgzip.open` for compressing and decompressing data to/from a file, as well as `pgzip.compress` and `pgzip.decompress` for in-memory byte manipulation. It utilizes `tempfile` for a runnable, clean example. Note the use of `thread` and `blocksize` parameters for parallelization."},"warnings":[{"fix":"Users on Python < 3.10 must upgrade their Python environment to 3.10 or newer, or downgrade `pgzip` to a compatible version (e.g., `pgzip==0.3.5`).","message":"pgzip v0.4.0 dropped support for Python 3.7, 3.8, and 3.9. It now officially supports Python versions 3.10 through 3.14.","severity":"breaking","affected_versions":">=0.4.0"},{"fix":"For very small files, consider using Python's built-in `gzip` module. For larger files, `pgzip` offers significant speedups. Optimize `blocksize` and `thread` parameters for your specific workload and data size.","message":"While `pgzip` is designed for performance, its parallel processing overhead can make it slower than the standard `gzip` module for files or data streams smaller than approximately 1MB.","severity":"gotcha","affected_versions":"All"},{"fix":"Review the official `pgzip` documentation for a precise list of supported features. If you rely on advanced `gzip` functionalities, test `pgzip` thoroughly or consider alternative strategies.","message":"pgzip only replaces specific functions and the `GzipFile` class from the standard `gzip` module (`open()`, `compress()`, `decompress()`). Other `gzip` features, such as `seek()` and `tell()`, might not be fully supported or tested.","severity":"gotcha","affected_versions":"All"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Upgrade `pgzip` to version 0.3.5 or newer: `pip install --upgrade pgzip`.","cause":"This error was common in `pgzip` versions prior to 0.3.5 when used with Python 3.11, due to internal changes in Python's `gzip` module that `pgzip` relied upon.","error":"AttributeError: '_io.BufferedReader' object has no attribute '_read_exact'"},{"fix":"Upgrade your Python environment to version 3.10 or newer. Alternatively, if upgrading Python is not feasible, downgrade `pgzip` to a compatible version, such as `pip install pgzip==0.3.5`.","cause":"Attempting to use `pgzip` version 0.4.0 or later with an unsupported Python version (e.g., Python 3.7, 3.8, or 3.9).","error":"RuntimeError: Python version 3.x.y is not supported by pgzip 0.4.0. Requires >=3.10"}]}