libpff-python
libpff-python provides Python bindings for libpff, an open-source library designed to access Personal Folder File (PFF) and Offline Folder File (OFF) formats, commonly used by Microsoft Outlook for storing emails, contacts, and other data. The current version is 20231205. Releases appear to occur annually or every few years, often coinciding with updates to the underlying C library. [2, 1, 3]
Common errors
-
OSError: pypff_file_open: unable to open file. libpff_io_handle_read_file_header: invalid file signature.
cause The PST file specified is either corrupt, not a valid PST file, or the path is incorrect.fixVerify the PST file is valid and not corrupted. Double-check the file path and permissions. Ensure the file is not currently open by another process. [20] -
error: subprocess-exited-with-error × Building wheel for libpff-python (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [5 lines of output] running bdist_wheel running build running build_ext building 'pypff' extension error: Microsoft Visual C++ 14.0 or greater is required.
cause On Windows, the installation of `libpff-python` requires a C compiler (like MSVC) to build the C extension, and it's not found or configured correctly.fixInstall 'Build Tools for Visual Studio' with the 'Desktop development with C++' workload. Ensure environment variables are set correctly for the compiler to be found by Python's build process. [16] -
OSError: pypff_folder_get_number_of_sub_messages: unable to retrieve number of sub messages.
cause This specific error indicates a breaking change in the `libpff` library or its Python bindings between versions 20211114 and 20231205, where the API for accessing sub-messages changed or became incompatible.fixIf encountering this after upgrading to 20231205, you may need to consult the `libyal/libpff` GitHub repository for updated usage patterns or downgrade to the previous stable version (e.g., `pip install libpff-python==20211114`). [11] -
ImportError: No module named 'pypff'
cause The `libpff-python` package was not installed successfully, or the Python environment where it was installed is not the one currently being used.fixEnsure `pip install libpff-python` completed without errors. Activate the correct virtual environment if one is being used. Verify `pypff.py` (or related compiled module) exists in your site-packages directory. [23]
Warnings
- breaking Updating `libpff-python` from version 20211114 to 20231205 introduced breaking changes, specifically affecting methods related to accessing sub-messages and attachments. Applications built with older versions may encounter `OSError` or incorrect data retrieval. [11]
- gotcha Installation on Windows often fails if the required Microsoft Visual C++ build tools are not installed, leading to compilation errors. This is due to `libpff-python` being a C extension that needs to be compiled. [16, 23]
- gotcha The Python module is imported as `pypff`, not `libpff-python`. Confusingly, there was also a `libpff-python-ratom` package in the past that provided similar functionality but sometimes had more up-to-date features; ensure you are using `libpff-python` and importing `pypff`. [8, 10, 19, 1]
- gotcha The Python bindings (`pypff`) are considered 'work in progress' by maintainers, implying that API stability is not guaranteed and breaking changes can occur with updates. [11]
Install
-
pip install libpff-python -
sudo apt-get install python3-pip libpff-dev pip3 install pypff -
brew install libpff pip3 install pypff
Imports
- pypff
import pypff
Quickstart
import pypff
import os
def process_pst_file(pst_file_path):
if not os.path.exists(pst_file_path):
print(f"Error: PST file not found at {pst_file_path}")
return
try:
pst = pypff.file()
pst.open(pst_file_path)
print(f"Opened PST file: {pst_file_path}")
root_folder = pst.get_root_folder()
print(f"Root folder: {root_folder.get_name()}")
def recurse_folders(folder, level=0):
indent = " " * level
for sub_folder_index in range(folder.get_number_of_sub_folders()):
sub_folder = folder.get_sub_folder(sub_folder_index)
print(f"{indent}- Folder: {sub_folder.get_name()}")
for message_index in range(sub_folder.get_number_of_sub_messages()):
message = sub_folder.get_sub_message(message_index)
print(f"{indent} - Message: {message.get_subject()} (from: {message.get_sender_name()})")
# Access other message properties, e.g., message.get_body(), message.get_client_submit_time()
if sub_folder.get_number_of_sub_folders() > 0:
recurse_folders(sub_folder, level + 1)
recurse_folders(root_folder)
pst.close()
except pypff.libpff.error as e:
print(f"Error processing PST file: {e}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
# Example usage:
# Create a dummy PST file path for demonstration. Replace with your actual PST file.
dummy_pst_path = os.environ.get('PST_FILE_PATH', 'example.pst')
process_pst_file(dummy_pst_path)