striprtf

0.0.29 · active · verified Thu Apr 09

striprtf is a simple Python library designed to convert Rich Text Format (RTF) files or strings into plain text. It focuses on handling various RTF parsing challenges, including automatic encoding detection (with a default of 'cp1252'), Unicode decoding, and the removal of binary data. The library is currently at version 0.0.29, released on March 27, 2025, and maintains an active release cadence.

Warnings

Install

Imports

Quickstart

Convert an RTF formatted string directly to plain text. The `rtf_to_text` function is the primary interface. You can specify the encoding if different from the default 'cp1252'.

from striprtf.striprtf import rtf_to_text

# Example RTF string (simplified for demonstration)
rtf_string = r"""{\rtf1\ansi\deff0\nouicompat{\fonttbl{\f0\fnil\fcharset0 Calibri;}{\f1\fnil\fcharset2 Symbol;}}
{\pard\sa200\sl276\slmult1\f0\fs22\lang9081 This is some \b bold\b0  text and some \i italic\i0  text.\par
Here is a \ul hyperlink\ul0 : {\field{\*\fldinst HYPERLINK "https://example.com"}{\fldrslt example.com}}\pard\sa200\sl276\slmult1\par
"""

# Convert RTF string to plain text
plain_text = rtf_to_text(rtf_string)
print(plain_text)

# To convert an RTF file:
# try:
#     with open("your_file.rtf", "r", encoding="cp1252") as f:
#         rtf_content = f.read()
#     file_plain_text = rtf_to_text(rtf_content)
#     print(file_plain_text)
# except FileNotFoundError:
#     print("Error: RTF file not found.")
# except Exception as e:
#     print(f"An error occurred: {e}")

view raw JSON →