MSG Parser
The `msg-parser` module enables reading, parsing, and converting Microsoft Outlook MSG E-Mail files. It facilitates extracting email properties, handling nested MSG/EML attachments, and outputting message content as JSON strings or EML files. The library is currently at version 1.2.0 (last released December 2019) and is compatible with Python 3.4 and higher.
Common errors
-
Exception: Invalid MSG file provided, 'properties_version1.0' stream data is empty.
cause The MSG file is either corrupted, empty, or does not conform to the expected OLE2 Compound Document format for Outlook messages, specifically missing the '__properties_version1.0' stream or having it empty.fixEnsure the input file is a valid and intact Outlook MSG file. Verify the file path is correct and the file isn't zero-byte. If files are generated by third-party tools, check their output for compliance. -
FileNotFoundError: [Errno 2] No such file or directory: 'path/to/your/email.msg'
cause The specified path to the MSG file does not exist or is incorrect. Python cannot find the file to open it.fixDouble-check the `msg_file_path` variable. Ensure the file exists at that exact location and that you have read permissions. Use `os.path.exists()` for pre-check or provide an absolute path. -
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x__ in position __: invalid start byte
cause The library or an underlying component (e.g., `olefile`) is attempting to decode a byte string using UTF-8, but the content is in a different encoding (e.g., CP1252, Shift_JIS, or a different Unicode encoding) or contains invalid byte sequences for UTF-8. This often happens with non-English characters in older MSG files.fixThe `msg-parser` library generally handles encodings, but if this error occurs, it might be an edge case. Consider trying to explicitly specify encoding if the library allowed it (which `MsOxMessage` constructor doesn't directly expose for the main file content). Inspect the problematic part of the file (if possible) for its true encoding.
Warnings
- gotcha The library reads the entire MSG file into memory for parsing. For very large MSG files or batch processing of many files, this can lead to high memory consumption, potentially causing `MemoryError`.
- gotcha Parsing malformed or corrupted MSG files can lead to `Exception`s such as 'Invalid MSG file provided, 'properties_version1.0' stream data is empty.' or unexpected behavior. The library expects a correctly structured OLE2 Compound Document.
- gotcha The library's last release was in December 2019. While functional, it may not receive updates for new Python versions (beyond current compatibility with 3.4+) or support the very latest intricacies of Microsoft Outlook's MSG format, which can evolve.
Install
-
pip install msg_parser -
pip install msg_parser[rtf]
Imports
- MsOxMessage
from msg_parser import MsOxMessage
Quickstart
import os
from msg_parser import MsOxMessage
# Create a dummy MSG file path for demonstration
# In a real scenario, replace 'path/to/your/email.msg' with your actual file.
msg_file_path = os.environ.get('MSG_FILE_PATH', 'path/to/your/email.msg')
if not os.path.exists(msg_file_path):
print(f"Warning: MSG file not found at '{msg_file_path}'. Cannot run quickstart.")
print("Please provide a valid MSG file path via MSG_FILE_PATH environment variable or directly.")
else:
try:
msg_obj = MsOxMessage(msg_file_path)
# Get message properties as a dictionary
properties = msg_obj.get_properties()
print(f"Subject: {properties.get('subject')}")
print(f"From: {properties.get('sender_name')}")
# Get message body (plain text, html, or rtf)
# The library tries to provide the 'cleanest' body available.
body = msg_obj.body
if body:
print("\nBody snippet:")
print(body[:200] + '...' if len(body) > 200 else body)
# Iterate and save attachments
print(f"\nAttachments found: {len(msg_obj.attachments)}")
for i, attachment in enumerate(msg_obj.attachments):
# Ensure an output directory exists for attachments
output_dir = 'attachments_output'
os.makedirs(output_dir, exist_ok=True)
attachment_path = os.path.join(output_dir, attachment.long_filename)
attachment.save(attachment_path)
print(f"Saved attachment {i+1}: {attachment_path}")
# Convert message to EML format and save
output_eml_path = os.path.join(output_dir, 'output_email.eml')
msg_obj.save_email_file(output_eml_path)
print(f"Converted MSG to EML: {output_eml_path}")
# Get message as JSON string
# json_string = msg_obj.get_message_as_json()
# print("\nMessage as JSON (first 500 chars):")
# print(json_string[:500] + '...')
except Exception as e:
print(f"Error processing MSG file: {e}")