olefile - OLE2 File Parser
The olefile library is a Python package designed to parse, read, and write Microsoft OLE2 files, also known as Structured Storage or Compound Documents. These files are commonly used in older Microsoft Office formats (e.g., .doc, .xls, .ppt, .msg) and provide a file system within a file. It offers low-level access to streams and storages. The current version is 0.47, and the library maintains a stable release cadence with updates focused on bug fixes and robustness.
Warnings
- breaking The `OleFileIO.meta` attribute was removed in version 0.43. Previously, this attribute provided access to OLE properties (e.g., Author, Title). Direct access to these properties using `ole.meta` will now raise an `AttributeError`.
- gotcha The `olefile.isOleFile()` function could return false positives for certain non-OLE files in versions prior to 0.45.1, incorrectly identifying them as OLE files. This could lead to parsing errors or unexpected behavior when attempting to open such files with `OleFileIO`.
- gotcha The `olefile` library does not inherently support parsing encrypted or password-protected OLE files. While it may be able to parse the high-level structure, the actual data streams within such files will remain encrypted and unreadable by `olefile`.
Install
-
pip install olefile
Imports
- olefile
import olefile
- OleFileIO
from olefile import OleFileIO
Quickstart
import olefile
import os
# For a real test, replace 'path/to/your/document.doc' with an actual OLE file path.
# This example uses a placeholder path and demonstrates the basic API.
# If the file does not exist or is not an OLE file, appropriate messages will be printed.
ole_file_path = 'example.doc' # Replace with a path to a real OLE file
if olefile.isOleFile(ole_file_path):
try:
# Open the OLE file
ole = olefile.OleFileIO(ole_file_path)
print(f"Opened OLE file: {ole_file_path}")
# List all streams and storages
print("\nStreams and Storages:")
for stream_path in ole.listdir():
print(f"- {stream_path}")
# Example: check if a specific stream exists and read its content
target_stream = ['WordDocument'] # Common stream in Word docs
if ole.exists(target_stream):
# Read stream content (returns bytes)
data = ole.openstream(target_stream).read()
print(f"\nContent of '{'/'.join(target_stream)}' (first 100 bytes):")
print(data[:100])
else:
print(f"\nStream '{'/'.join(target_stream)}' not found.")
# Close the file when done
ole.close()
except Exception as e:
print(f"Error processing OLE file '{ole_file_path}': {e}")
elif os.path.exists(ole_file_path):
print(f"'{ole_file_path}' exists but is not a valid OLE file.")
else:
print(f"'{ole_file_path}' does not exist.")
print("Please provide a valid path to a Microsoft OLE2 Structured Storage file for testing.")