OleFileIO_PL for Microsoft OLE2 Files
OleFileIO_PL is a Python package designed to parse, read, and write Microsoft OLE2 files, also known as Structured Storage or Compound Document files (e.g., older Microsoft Office formats like .doc, .xls, .ppt). It is an improved version of the original OleFileIO module from the Python Image Library (PIL). The current version is 0.42.1, and it maintains an active, though not rapid, release cadence.
Common errors
-
AttributeError: module 'olefile' has no attribute 'OleFileIO'
cause Attempting to import or access `olefile.OleFileIO` directly as a submodule, or trying to instantiate `OleFileIO()` without a `file_handle` argument.fixThe main class is `OleFileIO` (often instantiated via `olefile.open()`). If you need the class directly, use `from olefile import OleFileIO`. More commonly, you'd use `ole = olefile.open('path/to/file.doc')` to get an instance. -
olefile.BadOleFile: Not an OLE2 file
cause The file provided to `olefile.open()` is not a valid Microsoft OLE2 Structured Storage file. This can happen with modern Office files (.docx, .xlsx) which are ZIP archives, or entirely different file types.fixEnsure the input file is an actual OLE file (e.g., older .doc, .xls, .ppt files, or some MSI installers). Use `olefile.isOleFile(filepath)` to pre-check the file type before attempting to open it. -
olefile.OleFileIO.StreamNotFound: Stream '...' not found
cause You attempted to open a stream or storage using `ole.openstream()` or `ole.openstorage()` that either does not exist within the OLE file or whose name was provided with incorrect casing.fixUse `ole.listdir()` to get a list of all available streams and storages within the file and verify the exact name and its casing. Consider using `if ole.exists(stream_name):` before attempting to open a stream. -
TypeError: object of type 'OleProperty' has no len()
cause You are trying to treat an `OleProperty` object (returned by `getproperties()` or `get_property_by_id()`) directly as a string or a collection, without extracting its actual value.fixCall the appropriate conversion method on the `OleProperty` object to get its underlying value. For example, use `prop_value.as_str()` for strings, `prop_value.as_int()` for integers, or `prop_value.as_datetime()` for datetimes.
Warnings
- gotcha OleFileIO_PL loads the entire OLE file into memory when opened. This can lead to significant memory consumption and performance issues when processing very large OLE files.
- gotcha The `listdir()` method returns a list of tuples, where each tuple represents the path components of a stream or storage. It does not return joined paths (e.g., `[('WordDocument',)]` instead of `['WordDocument']`).
- gotcha Stream and storage names within OLE files are case-sensitive by default in OleFileIO_PL. Searching for a stream with incorrect casing will result in it not being found.
- gotcha When retrieving properties using methods like `getproperties()`, the values are returned as `OleProperty` objects, not raw Python types (strings, integers, datetimes).
Install
-
pip install olefileio-pl
Imports
- OleFileIO
import olefile.OleFileIO
from olefile import OleFileIO
- open
import olefile ole = olefile.open('file.doc') - isOleFile
import olefile is_ole = olefile.isOleFile('file.doc')
Quickstart
import olefile
import os
import tempfile
# For demonstration, let's create a dummy (non-OLE) file to show error handling.
# In a real scenario, you would point to an actual OLE file
# (e.g., an old .doc, .xls, .ppt document).
dummy_file_path = os.path.join(tempfile.gettempdir(), "dummy_not_ole.txt")
with open(dummy_file_path, "w") as f:
f.write("This is not an OLE file.")
# <<< IMPORTANT: REPLACE THIS with the actual path to your OLE file >>>
actual_ole_file_path = "path/to/your/actual_ole_file.doc"
print(f"Checking if '{dummy_file_path}' is an OLE file: {olefile.isOleFile(dummy_file_path)}")
print(f"Checking if '{actual_ole_file_path}' is an OLE file: {olefile.isOleFile(actual_ole_file_path)}\n")
try:
# Attempt to open a placeholder file - this will likely fail unless
# you replace 'actual_ole_file_path' with a real OLE file.
print(f"Attempting to open '{actual_ole_file_path}'...")
with olefile.open(actual_ole_file_path) as ole:
print(f"Successfully opened '{actual_ole_file_path}'.")
# List all top-level streams and storages
print("\nTop-level entries:")
for entry in ole.listdir():
print(f" - {entry}")
# Example: Access a stream if it exists (e.g., 'WordDocument' for .doc files)
if ole.exists('WordDocument'):
with ole.openstream('WordDocument') as stream:
content = stream.read()
print(f"\nFirst 100 bytes of 'WordDocument' stream: {content[:100]}")
else:
print("\n'WordDocument' stream not found.")
# Example: Read standard metadata properties (if available)
print("\nMetadata (if present):")
if ole.exists('Root Entry'):
root_props = ole.getproperties('Root Entry')
if root_props:
for prop_id, prop_name, prop_type, prop_value in root_props:
# Common properties like Author (0x04) or Creation Time (0x01)
if prop_id == 0x01: print(f" Creation Time: {prop_value.as_datetime()}")
elif prop_id == 0x04: print(f" Author: {prop_value.as_str()}")
else: print(f" Property {hex(prop_id)} ({prop_name}): {prop_value}")
else:
print(" No standard properties found in Root Entry.")
else:
print(" 'Root Entry' not found.")
except olefile.BadOleFile:
print(f"\nError: '{actual_ole_file_path}' is not a valid OLE file. Please provide a real OLE file.")
except FileNotFoundError:
print(f"\nError: '{actual_ole_file_path}' not found. Please provide a valid path to an OLE file.")
except Exception as e:
print(f"\nAn unexpected error occurred: {e}")
finally:
os.remove(dummy_file_path) # Clean up dummy file
print(f"\nCleaned up dummy file: {dummy_file_path}")