Compound Files
Compoundfiles is a Python library for parsing and reading OLE Compound Documents, a legacy file format used by Microsoft applications like MS Office before Office 2007. It provides a simple API to access streams and storages within these files. The current stable version is 0.3, last released in 2017, indicating the project is in a maintenance or abandoned state with no active development.
Warnings
- gotcha This library is designed for reading legacy OLE Compound Documents (e.g., .doc, .xls, .ppt files from Office 97-2003). It does NOT support modern Office Open XML (OOXML) formats like .docx, .xlsx, or .pptx.
- gotcha The `compoundfiles` library has not been updated since 2017 (version 0.3). It is not actively maintained, which means there may be unpatched bugs, security vulnerabilities, or compatibility issues with newer Python versions or operating systems.
- gotcha Parsing malformed or corrupted OLE Compound Documents can lead to exceptions or unexpected behavior, as the library may not handle all edge cases gracefully.
Install
-
pip install compoundfiles
Imports
- CompoundFile
from compoundfiles import CompoundFile
Quickstart
import os
from compoundfiles import CompoundFile
# IMPORTANT: Replace 'path/to/your/ole_document.doc' with the actual path
# to an OLE Compound Document (e.g., old .doc, .xls, .ppt files).
# This library is for reading existing OLE files.
# A sample OLE file (e.g., an old .doc) is required for this example to run meaningfully.
file_path = os.environ.get('OLE_FILE_PATH', './example.doc') # Placeholder for agent
if not os.path.exists(file_path):
print(f"Warning: OLE file not found at '{file_path}'. Please provide a valid path.")
print("Example usage requires an actual OLE Compound Document file.")
print("You can provide it via OLE_FILE_PATH environment variable or replace the hardcoded path.")
else:
try:
with CompoundFile(file_path) as cf:
print(f"Successfully opened OLE Compound Document: {file_path}")
print("\nRoot entries:")
for entry in cf.listdir('/'):
print(f"- {entry}")
# Example: Try to read a specific stream if it exists
# 'WordDocument' is a common stream name in OLE Word documents.
if 'WordDocument' in cf.listdir('/'):
with cf.open('WordDocument') as stream:
# Read a portion of the stream (actual content is often binary/complex)
content = stream.read(50)
print(f"\nContent of '/WordDocument' stream (first 50 bytes): {content!r}")
else:
print("\nNo 'WordDocument' stream found. Listing other common streams if available...")
for common_stream in ['Workbook', 'PowerPoint Document', 'Contents']: # Add other common stream names
if common_stream in cf.listdir('/'):
print(f"Found stream: '{common_stream}'")
except Exception as e:
print(f"\nError processing OLE file '{file_path}': {e}")
print("Ensure the file is a valid OLE Compound Document and not corrupted.")