{"id":10003,"library":"olefileio-pl","title":"OleFileIO_PL for Microsoft OLE2 Files","description":"OleFileIO_PL is a Python package designed to parse, read, and write Microsoft OLE2 files, also known as Structured Storage or Compound Document files (e.g., older Microsoft Office formats like .doc, .xls, .ppt). It is an improved version of the original OleFileIO module from the Python Image Library (PIL). The current version is 0.42.1, and it maintains an active, though not rapid, release cadence.","status":"active","version":"0.42.1","language":"en","source_language":"en","source_url":"https://github.com/decalage2/olefile","tags":["office","ole","structured storage","compound document","ms office","file format","parsing"],"install":[{"cmd":"pip install olefileio-pl","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"note":"The OleFileIO class is directly available from the top-level 'olefile' package after installation, not as a submodule.","wrong":"import olefile.OleFileIO","symbol":"OleFileIO","correct":"from olefile import OleFileIO"},{"note":"The primary way to open an OLE file is using the 'open' function directly from the 'olefile' package, which returns an OleFileIO instance.","symbol":"open","correct":"import olefile\nole = olefile.open('file.doc')"},{"note":"Use 'isOleFile' from the 'olefile' package to quickly check if a file is an OLE file based on its magic bytes, without fully opening it.","symbol":"isOleFile","correct":"import olefile\nis_ole = olefile.isOleFile('file.doc')"}],"quickstart":{"code":"import olefile\nimport os\nimport tempfile\n\n# For demonstration, let's create a dummy (non-OLE) file to show error handling.\n# In a real scenario, you would point to an actual OLE file \n# (e.g., an old .doc, .xls, .ppt document).\ndummy_file_path = os.path.join(tempfile.gettempdir(), \"dummy_not_ole.txt\")\nwith open(dummy_file_path, \"w\") as f:\n    f.write(\"This is not an OLE file.\")\n\n# <<< IMPORTANT: REPLACE THIS with the actual path to your OLE file >>>\nactual_ole_file_path = \"path/to/your/actual_ole_file.doc\"\n\nprint(f\"Checking if '{dummy_file_path}' is an OLE file: {olefile.isOleFile(dummy_file_path)}\")\nprint(f\"Checking if '{actual_ole_file_path}' is an OLE file: {olefile.isOleFile(actual_ole_file_path)}\\n\")\n\ntry:\n    # Attempt to open a placeholder file - this will likely fail unless\n    # you replace 'actual_ole_file_path' with a real OLE file.\n    print(f\"Attempting to open '{actual_ole_file_path}'...\")\n    with olefile.open(actual_ole_file_path) as ole:\n        print(f\"Successfully opened '{actual_ole_file_path}'.\")\n\n        # List all top-level streams and storages\n        print(\"\\nTop-level entries:\")\n        for entry in ole.listdir():\n            print(f\"  - {entry}\")\n\n        # Example: Access a stream if it exists (e.g., 'WordDocument' for .doc files)\n        if ole.exists('WordDocument'):\n            with ole.openstream('WordDocument') as stream:\n                content = stream.read()\n                print(f\"\\nFirst 100 bytes of 'WordDocument' stream: {content[:100]}\")\n        else:\n            print(\"\\n'WordDocument' stream not found.\")\n\n        # Example: Read standard metadata properties (if available)\n        print(\"\\nMetadata (if present):\")\n        if ole.exists('Root Entry'):\n            root_props = ole.getproperties('Root Entry')\n            if root_props:\n                for prop_id, prop_name, prop_type, prop_value in root_props:\n                    # Common properties like Author (0x04) or Creation Time (0x01)\n                    if prop_id == 0x01: print(f\"  Creation Time: {prop_value.as_datetime()}\")\n                    elif prop_id == 0x04: print(f\"  Author: {prop_value.as_str()}\")\n                    else: print(f\"  Property {hex(prop_id)} ({prop_name}): {prop_value}\")\n            else:\n                print(\"  No standard properties found in Root Entry.\")\n        else:\n            print(\"  'Root Entry' not found.\")\n\nexcept olefile.BadOleFile:\n    print(f\"\\nError: '{actual_ole_file_path}' is not a valid OLE file. Please provide a real OLE file.\")\nexcept FileNotFoundError:\n    print(f\"\\nError: '{actual_ole_file_path}' not found. Please provide a valid path to an OLE file.\")\nexcept Exception as e:\n    print(f\"\\nAn unexpected error occurred: {e}\")\nfinally:\n    os.remove(dummy_file_path) # Clean up dummy file\n    print(f\"\\nCleaned up dummy file: {dummy_file_path}\")","lang":"python","description":"This quickstart demonstrates how to check if a file is an OLE document, open it, list its contents, and extract common metadata. Note that for full functionality, you must replace `actual_ole_file_path` with the path to an existing, valid OLE file."},"warnings":[{"fix":"Be mindful of file sizes. For extremely large files, consider processing them on systems with ample memory or exploring alternative libraries if memory usage becomes critical.","message":"OleFileIO_PL loads the entire OLE file into memory when opened. This can lead to significant memory consumption and performance issues when processing very large OLE files.","severity":"gotcha","affected_versions":"All versions"},{"fix":"If you need a joined path for display or specific operations, you must manually join the tuple components, e.g., `'/'.join(entry)`.","message":"The `listdir()` method returns a list of tuples, where each tuple represents the path components of a stream or storage. It does not return joined paths (e.g., `[('WordDocument',)]` instead of `['WordDocument']`).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always use the exact case of stream and storage names as reported by `ole.listdir()`. You can use `ole.exists(name)` to check for existence, or `ole.openstream(name)` within a `try-except olefile.OleFileIO.StreamNotFound` block.","message":"Stream and storage names within OLE files are case-sensitive by default in OleFileIO_PL. Searching for a stream with incorrect casing will result in it not being found.","severity":"gotcha","affected_versions":"All versions"},{"fix":"You must explicitly convert `OleProperty` objects to their desired Python type using methods such as `.as_str()`, `.as_int()`, `.as_datetime()`, `.as_bool()`, etc.","message":"When retrieving properties using methods like `getproperties()`, the values are returned as `OleProperty` objects, not raw Python types (strings, integers, datetimes).","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-17T00:00:00.000Z","next_check":"2026-07-16T00:00:00.000Z","problems":[{"fix":"The main class is `OleFileIO` (often instantiated via `olefile.open()`). If you need the class directly, use `from olefile import OleFileIO`. More commonly, you'd use `ole = olefile.open('path/to/file.doc')` to get an instance.","cause":"Attempting to import or access `olefile.OleFileIO` directly as a submodule, or trying to instantiate `OleFileIO()` without a `file_handle` argument.","error":"AttributeError: module 'olefile' has no attribute 'OleFileIO'"},{"fix":"Ensure the input file is an actual OLE file (e.g., older .doc, .xls, .ppt files, or some MSI installers). Use `olefile.isOleFile(filepath)` to pre-check the file type before attempting to open it.","cause":"The file provided to `olefile.open()` is not a valid Microsoft OLE2 Structured Storage file. This can happen with modern Office files (.docx, .xlsx) which are ZIP archives, or entirely different file types.","error":"olefile.BadOleFile: Not an OLE2 file"},{"fix":"Use `ole.listdir()` to get a list of all available streams and storages within the file and verify the exact name and its casing. Consider using `if ole.exists(stream_name):` before attempting to open a stream.","cause":"You attempted to open a stream or storage using `ole.openstream()` or `ole.openstorage()` that either does not exist within the OLE file or whose name was provided with incorrect casing.","error":"olefile.OleFileIO.StreamNotFound: Stream '...' not found"},{"fix":"Call the appropriate conversion method on the `OleProperty` object to get its underlying value. For example, use `prop_value.as_str()` for strings, `prop_value.as_int()` for integers, or `prop_value.as_datetime()` for datetimes.","cause":"You are trying to treat an `OleProperty` object (returned by `getproperties()` or `get_property_by_id()`) directly as a string or a collection, without extracting its actual value.","error":"TypeError: object of type 'OleProperty' has no len()"}]}