olefile - OLE2 File Parser

0.47 · active · verified Thu Apr 09

The olefile library is a Python package designed to parse, read, and write Microsoft OLE2 files, also known as Structured Storage or Compound Documents. These files are commonly used in older Microsoft Office formats (e.g., .doc, .xls, .ppt, .msg) and provide a file system within a file. It offers low-level access to streams and storages. The current version is 0.47, and the library maintains a stable release cadence with updates focused on bug fixes and robustness.

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to open an OLE file, check if it's a valid OLE structure, list its internal streams and storages, and read content from a specific stream. Replace 'example.doc' with the actual path to your OLE file (e.g., a .doc, .xls, .ppt, or .msg file) for a real test.

import olefile
import os

# For a real test, replace 'path/to/your/document.doc' with an actual OLE file path.
# This example uses a placeholder path and demonstrates the basic API.
# If the file does not exist or is not an OLE file, appropriate messages will be printed.

ole_file_path = 'example.doc' # Replace with a path to a real OLE file

if olefile.isOleFile(ole_file_path):
    try:
        # Open the OLE file
        ole = olefile.OleFileIO(ole_file_path)

        print(f"Opened OLE file: {ole_file_path}")

        # List all streams and storages
        print("\nStreams and Storages:")
        for stream_path in ole.listdir():
            print(f"- {stream_path}")

        # Example: check if a specific stream exists and read its content
        target_stream = ['WordDocument'] # Common stream in Word docs
        if ole.exists(target_stream):
            # Read stream content (returns bytes)
            data = ole.openstream(target_stream).read()
            print(f"\nContent of '{'/'.join(target_stream)}' (first 100 bytes):")
            print(data[:100])
        else:
            print(f"\nStream '{'/'.join(target_stream)}' not found.")

        # Close the file when done
        ole.close()

    except Exception as e:
        print(f"Error processing OLE file '{ole_file_path}': {e}")
elif os.path.exists(ole_file_path):
    print(f"'{ole_file_path}' exists but is not a valid OLE file.")
else:
    print(f"'{ole_file_path}' does not exist.")
    print("Please provide a valid path to a Microsoft OLE2 Structured Storage file for testing.")

view raw JSON →