HTML to DOCX Converter (htmldocx)

0.0.6 · maintenance · verified Sun Apr 12

The `htmldocx` library provides functionality to convert HTML content into DOCX format, building upon `python-docx` and `beautifulsoup4`. While its last release was in August 2021, it is considered to be in a maintenance state, with more actively developed forks available that address limitations and bugs present in this version.

Warnings

Install

Imports

Quickstart

Initialise `HtmlToDocx` and use `add_html_to_document` to insert HTML into a `python-docx` Document object, or use `parse_html_file` / `parse_html_string` for direct conversion.

from docx import Document
from htmldocx import HtmlToDocx

document = Document()
new_parser = HtmlToDocx()

html_content = '<h1>Hello world</h1><p>This is a paragraph.</p>'

# Add HTML to an existing Document object
new_parser.add_html_to_document(html_content, document)

# Save the document
document.save('your_file_name.docx')

# Or convert a file directly
# new_parser.parse_html_file('input.html', 'output.docx')

# Or convert from an HTML string to a new docx object
# docx_object = new_parser.parse_html_string('<h2>Another title</h2>')
# docx_object.save('another_file.docx')

view raw JSON →