HTML to DOCX Converter

1.1.4 · active · verified Thu Apr 16

html-for-docx is a Python library designed to convert HTML content into Microsoft Word (.docx) documents easily and efficiently. It is an actively maintained fork of the discontinued `pqzx/html2docx` project, providing a more reliable solution for generating Word documents from various HTML inputs. The current version is 1.1.4, with a consistent release cadence focusing on bug fixes and feature enhancements, including improved CSS and HTML tag support.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates the core functionalities of html-for-docx: adding HTML strings to a `python-docx` Document object, saving to a file, converting to an in-memory BytesIO object, and converting directly from an HTML file.

from docx import Document
from html4docx import HtmlToDocx
from io import BytesIO

# Example 1: Add HTML to an existing Document object and save
document = Document()  # Or load an existing .docx: Document('template.docx')
parser = HtmlToDocx()
html_string = '<h1>Hello world</h1><p>This is a <strong>paragraph</strong> with some <em>formatting</em>.</p>'
parser.add_html_to_document(html_string, document)
document.save('output.docx')

print("Saved 'output.docx' with basic HTML content.")

# Example 2: Convert an HTML string directly to a BytesIO object (in-memory)
buffer = BytesIO()
parser_in_memory = HtmlToDocx()
html_string_2 = '<p style="color: blue;">This text is blue.</p>'
parser_in_memory.add_html_to_document(html_string_2, buffer)

# To read from the buffer again, reset its position
buffer.seek(0)
print(f"Generated DOCX in memory, size: {len(buffer.getvalue())} bytes.")

# Example 3: Convert an HTML file directly
# Create a dummy HTML file for demonstration
with open('input.html', 'w', encoding='utf-8') as f:
    f.write('<h2>Content from file</h2><p>This was converted from an HTML file.</p>')

file_parser = HtmlToDocx()
file_parser.parse_html_file('input.html', 'output_from_file.docx')

print("Saved 'output_from_file.docx' from 'input.html'.")

view raw JSON →