{"id":8230,"library":"indexed-gzip","title":"Indexed Gzip","description":"The `indexed-gzip` project is a Python extension providing fast random access to gzip files by building an index of seek points. It acts as a drop-in replacement for Python's built-in `gzip.GzipFile` class, significantly improving performance for `seek` operations on large gzipped files. It is currently at version 1.10.3 and is actively maintained with regular releases.","status":"active","version":"1.10.3","language":"en","source_language":"en","source_url":"https://github.com/pauldmccarthy/indexed_gzip","tags":["gzip","compression","random-access","file-io","performance","nifti"],"install":[{"cmd":"pip install indexed-gzip","lang":"bash","label":"Install via pip"}],"dependencies":[{"reason":"Required for building from source, but not a runtime dependency for pre-built wheels.","package":"Cython","optional":true}],"imports":[{"note":"Commonly imported as 'igzip' for brevity: `import indexed_gzip as igzip` then `igzip.IndexedGzipFile`.","symbol":"IndexedGzipFile","correct":"from indexed_gzip import IndexedGzipFile"}],"quickstart":{"code":"import indexed_gzip as igzip\nimport os\n\n# Create a dummy gzip file for demonstration\ndummy_data = b\"This is some sample data for a gzipped file.\\nRepeat this line many times to make it bigger.\\n\" * 10000\nwith open('test_file.gz', 'wb') as f:\n    import gzip\n    g = gzip.GzipFile(fileobj=f, mode='wb')\n    g.write(dummy_data)\n    g.close()\n\n# Open the indexed gzip file\ntry:\n    with igzip.IndexedGzipFile('test_file.gz') as fobj:\n        print(f\"Original file size: {len(dummy_data)} bytes\")\n        \n        # Seek to an arbitrary position\n        fobj.seek(15000)\n        data = fobj.read(100)\n        print(f\"Read 100 bytes from offset 15000: {data.decode(errors='ignore')[:50]}...\")\n        \n        # Seek to another position\n        fobj.seek(5000)\n        data = fobj.read(50)\n        print(f\"Read 50 bytes from offset 5000: {data.decode(errors='ignore')[:50]}...\")\n        \n        # Build a full index explicitly (optional, often done on demand)\n        fobj.build_full_index()\n        print(f\"Index built with {fobj.tell()} bytes processed.\")\n\nfinally:\n    # Clean up the dummy file\n    if os.path.exists('test_file.gz'):\n        os.remove('test_file.gz')","lang":"python","description":"This example demonstrates how to open a gzip file with `IndexedGzipFile`, perform random `seek` and `read` operations, and explicitly build a full index. It shows how `indexed-gzip` acts as a drop-in replacement for `gzip.GzipFile` for read operations, but with significantly improved performance for non-sequential access."},"warnings":[{"fix":"Use Python's built-in `gzip` module or another library for writing gzipped files. Then, open the saved file with `indexed-gzip` for fast random access.","message":"The `IndexedGzipFile` class currently does not support writing data. It is a read-only interface. Attempting to open in write mode or call write methods will result in an error.","severity":"breaking","affected_versions":"All versions"},{"fix":"Tune the `spacing` parameter (e.g., `IndexedGzipFile(filename, spacing=65536)`) based on your application's read patterns and memory constraints. For very fine-grained seeking, a smaller value might be beneficial, while for large files with infrequent seeks, a larger value saves memory.","message":"The `spacing` parameter during `IndexedGzipFile` initialization (or implicitly during index building) controls the density of seek points. A smaller `spacing` improves seek performance but increases memory usage for the index, and vice-versa. The default is 1MB.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Upgrade to `indexed-gzip` version 1.10.2 or newer. If upgrading is not possible, convert `pathlib.Path` objects to strings using `str(path_obj)` before passing them to `IndexedGzipFile`.","message":"Prior to version 1.10.2, passing `pathlib.Path` objects directly to `IndexedGzipFile` for the filename argument might not have been fully supported, potentially leading to errors or unexpected behavior.","severity":"deprecated","affected_versions":"<1.10.2"},{"fix":"Upgrade to `indexed-gzip` version 1.10.0 or newer to avoid this specific data corruption/read error. If unable to upgrade, ensure CRC validation is enabled (`crc_check=True`, default) or be aware of potential issues with malformed (but technically valid) GZIP files.","message":"A bug in versions prior to 1.10.0 could occur when CRC validation was disabled, particularly on GZIP streams where the stream footer contained bytes matching the GZIP magic bytes `0x1f8b`.","severity":"gotcha","affected_versions":"<1.10.0"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"The `IndexedGzipFile` is strictly read-only. For writing gzipped files, use Python's built-in `gzip` module (e.g., `gzip.GzipFile('file.gz', 'wb')`) or another appropriate library. Once written, the file can be opened with `indexed-gzip` for efficient random access.","cause":"Attempting to open an `IndexedGzipFile` in write mode ('w', 'wb', 'a', etc.) or calling write methods on it.","error":"IOError: No write support for IndexedGzipFile"},{"fix":"Upgrade `indexed-gzip` to version 1.10.2 or later. Alternatively, convert the `pathlib.Path` object to a string before passing it: `igzip.IndexedGzipFile(str(my_path_obj))`.","cause":"Passing a `pathlib.Path` object directly as the `filename` argument to `IndexedGzipFile` in an older version (<1.10.2) that did not explicitly support `pathlib.Path`.","error":"TypeError: argument of type 'Path' is not iterable"},{"fix":"Replace `gzip.GzipFile` with `indexed_gzip.IndexedGzipFile`. This library builds an internal index allowing for much faster random `seek` operations. `import indexed_gzip as igzip` and use `igzip.IndexedGzipFile` instead of `gzip.GzipFile`.","cause":"The standard `gzip.GzipFile` class must decompress from the beginning of the file up to the desired seek point, making random access inefficient, especially for large files.","error":"Extremely slow seek() operations when using `gzip.GzipFile` on large files."}]}