RTFDE - RTF Encapsulated HTML Extractor

0.1.2.2 · active · verified Sat Apr 11

RTFDE (RTF De-Encapsulator) is a Python library designed to extract HTML content from RTF-encapsulated HTML, a common format found within Exchange MSG email files. It provides robust parsing and de-encapsulation capabilities, focusing on raw byte input. The library is currently at version 0.1.2.2 and receives active maintenance with regular bug fixes and minor updates.

Warnings

Install

Imports

Quickstart

Initialize the DeEncapsulator and pass RTF content as bytes to the `deencapsulate` method to extract the embedded HTML. The library handles the parsing and extraction of the 'htmlrtf' section.

from rtfde.deencapsulate import DeEncapsulator

# Example RTF content (must be bytes)
rtf_bytes = b'{\rtf1\ansi{\fonttbl\f0\fswiss Helvetica;}\pard\ql{\f0\fs24 Hello World!}\par\htmlrtf {\html \pard This is <b>HTML</b> content.}}'

de = DeEncapsulator()
html_content = de.deencapsulate(rtf_bytes)

print(html_content)
# Expected output: 'This is <b>HTML</b> content.'

view raw JSON →