{"id":2296,"library":"striprtf","title":"striprtf","description":"striprtf is a simple Python library designed to convert Rich Text Format (RTF) files or strings into plain text. It focuses on handling various RTF parsing challenges, including automatic encoding detection (with a default of 'cp1252'), Unicode decoding, and the removal of binary data. The library is currently at version 0.0.29, released on March 27, 2025, and maintains an active release cadence.","status":"active","version":"0.0.29","language":"en","source_language":"en","source_url":"https://github.com/joshy/striprtf.git","tags":["rtf","text conversion","document processing","plain text"],"install":[{"cmd":"pip install striprtf","lang":"bash","label":"Install stable version"}],"dependencies":[],"imports":[{"symbol":"rtf_to_text","correct":"from striprtf.striprtf import rtf_to_text"}],"quickstart":{"code":"from striprtf.striprtf import rtf_to_text\n\n# Example RTF string (simplified for demonstration)\nrtf_string = r\"\"\"{\\rtf1\\ansi\\deff0\\nouicompat{\\fonttbl{\\f0\\fnil\\fcharset0 Calibri;}{\\f1\\fnil\\fcharset2 Symbol;}}\n{\\pard\\sa200\\sl276\\slmult1\\f0\\fs22\\lang9081 This is some \\b bold\\b0  text and some \\i italic\\i0  text.\\par\nHere is a \\ul hyperlink\\ul0 : {\\field{\\*\\fldinst HYPERLINK \"https://example.com\"}{\\fldrslt example.com}}\\pard\\sa200\\sl276\\slmult1\\par\n\"\"\"\n\n# Convert RTF string to plain text\nplain_text = rtf_to_text(rtf_string)\nprint(plain_text)\n\n# To convert an RTF file:\n# try:\n#     with open(\"your_file.rtf\", \"r\", encoding=\"cp1252\") as f:\n#         rtf_content = f.read()\n#     file_plain_text = rtf_to_text(rtf_content)\n#     print(file_plain_text)\n# except FileNotFoundError:\n#     print(\"Error: RTF file not found.\")\n# except Exception as e:\n#     print(f\"An error occurred: {e}\")","lang":"python","description":"Convert an RTF formatted string directly to plain text. The `rtf_to_text` function is the primary interface. You can specify the encoding if different from the default 'cp1252'."},"warnings":[{"fix":"Ensure your environment uses Python 3.8 or newer. Upgrade your Python interpreter if necessary.","message":"Python 2 compatibility was dropped after version 0.0.13. The library now explicitly requires Python >=3.8.","severity":"breaking","affected_versions":"<=0.0.13 for Python 2 compatibility; >=0.0.14 requires Python 3.8+"},{"fix":"Always check your RTF file's expected encoding. Pass the `encoding` parameter to `rtf_to_text` if it differs from the default or if you encounter decoding issues. For `errors` handling, ensure you are on `v0.0.25` or newer.","message":"The `encoding` parameter in `rtf_to_text` defaults to 'cp1252'. If your RTF file uses a different explicit codepage (e.g., 'ansicpg932' for Japanese), the library will attempt to detect it. However, if no explicit codepage is set in the RTF and the content uses a non-default encoding, you must explicitly pass the correct `encoding` parameter (e.g., `encoding=\"utf-8\"`) to avoid decoding errors. An issue where the `errors` parameter was not used was fixed in v0.0.25.","severity":"gotcha","affected_versions":"All versions, especially prior to v0.0.25 for `errors` parameter."},{"fix":"For critical applications with varied or complex RTF inputs, perform thorough testing with representative samples. Consider pre-processing malformed RTF or using more robust, enterprise-grade RTF parsers if full fidelity is required for all RTF complexities.","message":"The library is designed for simplicity and may not fully parse or correctly strip all complex or highly malformed RTF structures. While it handles common elements well, very intricate formatting, embedded objects beyond basic images, or severely corrupted RTF might result in incomplete text extraction.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-09T00:00:00.000Z","next_check":"2026-07-08T00:00:00.000Z"}