html2docx

1.6.0 · active · verified Fri Apr 17

html2docx is a Python library that converts valid HTML input into Microsoft Word (.docx) documents. It leverages `python-docx` for document generation and `BeautifulSoup` for robust HTML parsing, aiming to translate common HTML structures and basic styling into an editable Word format. The current version is 1.6.0, with a release cadence that focuses on bug fixes and minor feature enhancements rather than frequent major API changes.

Common errors

Warnings

Install

Imports

Quickstart

This quickstart demonstrates how to convert a string of HTML content into a .docx file using `html2docx`. It initializes `Html2Docx`, parses the HTML, and saves the resulting `python-docx` Document object to a temporary file, suitable for viewing or further processing.

import os
import tempfile
from html2docx import Html2Docx

# Example HTML content with basic structure and an external image
html_content = """
<h1>Hello World!</h1>
<p>This is a paragraph with some <strong>bold</strong> and <em>italic</em> text.</p>
<ul>
    <li>List Item 1</li>
    <li>List Item 2</li>
</ul>
<p style="text-align: center;">A centered paragraph.</p>
<img src="https://www.python.org/static/community_logos/python-logo-only.png" width="100px" alt="Python Logo">
"""

# Initialize the parser
new_parser = Html2Docx()

# Parse the HTML and get a python-docx Document object
docx = new_parser.parse_html_section(html_content)

# Save the document to a temporary file
with tempfile.NamedTemporaryFile(suffix=".docx", delete=False) as temp_file:
    docx.save(temp_file.name)
    print(f"Generated DOCX saved to: {temp_file.name}")

# To view the file, uncomment the following line (might not work on all OSs)
# os.startfile(temp_file.name) # On Windows
# import subprocess; subprocess.call(['open', temp_file.name]) # On macOS

# In a real application, you might want to clean up the temp file after use
# os.remove(temp_file.name)

view raw JSON →