Indexed Gzip

1.10.3 · active · verified Thu Apr 16

The `indexed-gzip` project is a Python extension providing fast random access to gzip files by building an index of seek points. It acts as a drop-in replacement for Python's built-in `gzip.GzipFile` class, significantly improving performance for `seek` operations on large gzipped files. It is currently at version 1.10.3 and is actively maintained with regular releases.

Common errors

Warnings

Install

Imports

Quickstart

This example demonstrates how to open a gzip file with `IndexedGzipFile`, perform random `seek` and `read` operations, and explicitly build a full index. It shows how `indexed-gzip` acts as a drop-in replacement for `gzip.GzipFile` for read operations, but with significantly improved performance for non-sequential access.

import indexed_gzip as igzip
import os

# Create a dummy gzip file for demonstration
dummy_data = b"This is some sample data for a gzipped file.\nRepeat this line many times to make it bigger.\n" * 10000
with open('test_file.gz', 'wb') as f:
    import gzip
    g = gzip.GzipFile(fileobj=f, mode='wb')
    g.write(dummy_data)
    g.close()

# Open the indexed gzip file
try:
    with igzip.IndexedGzipFile('test_file.gz') as fobj:
        print(f"Original file size: {len(dummy_data)} bytes")
        
        # Seek to an arbitrary position
        fobj.seek(15000)
        data = fobj.read(100)
        print(f"Read 100 bytes from offset 15000: {data.decode(errors='ignore')[:50]}...")
        
        # Seek to another position
        fobj.seek(5000)
        data = fobj.read(50)
        print(f"Read 50 bytes from offset 5000: {data.decode(errors='ignore')[:50]}...")
        
        # Build a full index explicitly (optional, often done on demand)
        fobj.build_full_index()
        print(f"Index built with {fobj.tell()} bytes processed.")

finally:
    # Clean up the dummy file
    if os.path.exists('test_file.gz'):
        os.remove('test_file.gz')

view raw JSON →