Compound Files

0.3 · maintenance · verified Wed Apr 15

Compoundfiles is a Python library for parsing and reading OLE Compound Documents, a legacy file format used by Microsoft applications like MS Office before Office 2007. It provides a simple API to access streams and storages within these files. The current stable version is 0.3, last released in 2017, indicating the project is in a maintenance or abandoned state with no active development.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to open an OLE Compound Document, list its root entries (streams and storages), and attempt to read content from a common stream like 'WordDocument'. It requires a path to an existing OLE file to function properly.

import os
from compoundfiles import CompoundFile

# IMPORTANT: Replace 'path/to/your/ole_document.doc' with the actual path
# to an OLE Compound Document (e.g., old .doc, .xls, .ppt files).
# This library is for reading existing OLE files.
# A sample OLE file (e.g., an old .doc) is required for this example to run meaningfully.
file_path = os.environ.get('OLE_FILE_PATH', './example.doc') # Placeholder for agent

if not os.path.exists(file_path):
    print(f"Warning: OLE file not found at '{file_path}'. Please provide a valid path.")
    print("Example usage requires an actual OLE Compound Document file.")
    print("You can provide it via OLE_FILE_PATH environment variable or replace the hardcoded path.")
else:
    try:
        with CompoundFile(file_path) as cf:
            print(f"Successfully opened OLE Compound Document: {file_path}")
            print("\nRoot entries:")
            for entry in cf.listdir('/'):
                print(f"- {entry}")

            # Example: Try to read a specific stream if it exists
            # 'WordDocument' is a common stream name in OLE Word documents.
            if 'WordDocument' in cf.listdir('/'):
                with cf.open('WordDocument') as stream:
                    # Read a portion of the stream (actual content is often binary/complex)
                    content = stream.read(50)
                    print(f"\nContent of '/WordDocument' stream (first 50 bytes): {content!r}")
            else:
                print("\nNo 'WordDocument' stream found. Listing other common streams if available...")
                for common_stream in ['Workbook', 'PowerPoint Document', 'Contents']: # Add other common stream names
                    if common_stream in cf.listdir('/'):
                        print(f"Found stream: '{common_stream}'")

    except Exception as e:
        print(f"\nError processing OLE file '{file_path}': {e}")
        print("Ensure the file is a valid OLE Compound Document and not corrupted.")

view raw JSON →