{"id":3265,"library":"rtfde","title":"RTFDE - RTF Encapsulated HTML Extractor","description":"RTFDE (RTF De-Encapsulator) is a Python library designed to extract HTML content from RTF-encapsulated HTML, a common format found within Exchange MSG email files. It provides robust parsing and de-encapsulation capabilities, focusing on raw byte input. The library is currently at version 0.1.2.2 and receives active maintenance with regular bug fixes and minor updates.","status":"active","version":"0.1.2.2","language":"en","source_language":"en","source_url":"https://github.com/seamustuohy/RTFDE","tags":["email","rtf","html","parsing","extractor","msg"],"install":[{"cmd":"pip install rtfde","lang":"bash","label":"Install stable version"}],"dependencies":[{"reason":"Required for parsing RTF grammar.","package":"lark","optional":false}],"imports":[{"note":"The main class is located within the 'deencapsulate' submodule, not directly under the top-level package.","wrong":"from rtfde import DeEncapsulator","symbol":"DeEncapsulator","correct":"from rtfde.deencapsulate import DeEncapsulator"}],"quickstart":{"code":"from rtfde.deencapsulate import DeEncapsulator\n\n# Example RTF content (must be bytes)\nrtf_bytes = b'{\\rtf1\\ansi{\\fonttbl\\f0\\fswiss Helvetica;}\\pard\\ql{\\f0\\fs24 Hello World!}\\par\\htmlrtf {\\html \\pard This is <b>HTML</b> content.}}'\n\nde = DeEncapsulator()\nhtml_content = de.deencapsulate(rtf_bytes)\n\nprint(html_content)\n# Expected output: 'This is <b>HTML</b> content.'","lang":"python","description":"Initialize the DeEncapsulator and pass RTF content as bytes to the `deencapsulate` method to extract the embedded HTML. The library handles the parsing and extraction of the 'htmlrtf' section."},"warnings":[{"fix":"Ensure all RTF input is converted to `bytes` (e.g., `my_rtf_string.encode('utf-8')`) before being passed to `DeEncapsulator.deencapsulate`.","message":"Starting from version 0.1.0, the `deencapsulate` method strictly requires `bytes` as input. Prior versions accepted string input, which is no longer supported.","severity":"breaking","affected_versions":">=0.1.0"},{"fix":"Upgrade to version 0.1.2.2 or newer to benefit from the fix for invalid Unicode escape sequence handling.","message":"Versions of `rtfde` prior to 0.1.2.2 contained a bug (Issue #34) where invalid Unicode escape sequences within RTF byte strings could lead to parsing errors or incorrect output.","severity":"gotcha","affected_versions":"<0.1.2.2"},{"fix":"Implement robust error handling around `de.deencapsulate()` and validate the extracted HTML content. Consider preprocessing or sanitizing the RTF source if its integrity is frequently questionable.","message":"The library is designed to extract HTML from potentially complex and sometimes malformed RTF structures, especially those found in email attachments. Inputting poorly formed RTF may result in partial, incorrect, or no HTML being extracted.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-11T00:00:00.000Z","next_check":"2026-07-10T00:00:00.000Z"}