striprtf
striprtf is a simple Python library designed to convert Rich Text Format (RTF) files or strings into plain text. It focuses on handling various RTF parsing challenges, including automatic encoding detection (with a default of 'cp1252'), Unicode decoding, and the removal of binary data. The library is currently at version 0.0.29, released on March 27, 2025, and maintains an active release cadence.
Warnings
- breaking Python 2 compatibility was dropped after version 0.0.13. The library now explicitly requires Python >=3.8.
- gotcha The `encoding` parameter in `rtf_to_text` defaults to 'cp1252'. If your RTF file uses a different explicit codepage (e.g., 'ansicpg932' for Japanese), the library will attempt to detect it. However, if no explicit codepage is set in the RTF and the content uses a non-default encoding, you must explicitly pass the correct `encoding` parameter (e.g., `encoding="utf-8"`) to avoid decoding errors. An issue where the `errors` parameter was not used was fixed in v0.0.25.
- gotcha The library is designed for simplicity and may not fully parse or correctly strip all complex or highly malformed RTF structures. While it handles common elements well, very intricate formatting, embedded objects beyond basic images, or severely corrupted RTF might result in incomplete text extraction.
Install
-
pip install striprtf
Imports
- rtf_to_text
from striprtf.striprtf import rtf_to_text
Quickstart
from striprtf.striprtf import rtf_to_text
# Example RTF string (simplified for demonstration)
rtf_string = r"""{\rtf1\ansi\deff0\nouicompat{\fonttbl{\f0\fnil\fcharset0 Calibri;}{\f1\fnil\fcharset2 Symbol;}}
{\pard\sa200\sl276\slmult1\f0\fs22\lang9081 This is some \b bold\b0 text and some \i italic\i0 text.\par
Here is a \ul hyperlink\ul0 : {\field{\*\fldinst HYPERLINK "https://example.com"}{\fldrslt example.com}}\pard\sa200\sl276\slmult1\par
"""
# Convert RTF string to plain text
plain_text = rtf_to_text(rtf_string)
print(plain_text)
# To convert an RTF file:
# try:
# with open("your_file.rtf", "r", encoding="cp1252") as f:
# rtf_content = f.read()
# file_plain_text = rtf_to_text(rtf_content)
# print(file_plain_text)
# except FileNotFoundError:
# print("Error: RTF file not found.")
# except Exception as e:
# print(f"An error occurred: {e}")