{"id":7289,"library":"html-for-docx","title":"HTML to DOCX Converter","description":"html-for-docx is a Python library designed to convert HTML content into Microsoft Word (.docx) documents easily and efficiently. It is an actively maintained fork of the discontinued `pqzx/html2docx` project, providing a more reliable solution for generating Word documents from various HTML inputs. The current version is 1.1.4, with a consistent release cadence focusing on bug fixes and feature enhancements, including improved CSS and HTML tag support.","status":"active","version":"1.1.4","language":"en","source_language":"en","source_url":"https://github.com/dfop02/html4docx","tags":["html","docx","conversion","word","document","microsoft-word"],"install":[{"cmd":"pip install html-for-docx","lang":"bash","label":"Install latest version"}],"dependencies":[{"reason":"Core functionality relies on `python-docx` for creating and manipulating Word documents.","package":"python-docx"},{"reason":"Used internally for parsing and fixing HTML content, especially when the `Disable Fix-HTML` option is not enabled.","package":"BeautifulSoup4"}],"imports":[{"symbol":"HtmlToDocx","correct":"from html4docx import HtmlToDocx"}],"quickstart":{"code":"from docx import Document\nfrom html4docx import HtmlToDocx\nfrom io import BytesIO\n\n# Example 1: Add HTML to an existing Document object and save\ndocument = Document()  # Or load an existing .docx: Document('template.docx')\nparser = HtmlToDocx()\nhtml_string = '<h1>Hello world</h1><p>This is a <strong>paragraph</strong> with some <em>formatting</em>.</p>'\nparser.add_html_to_document(html_string, document)\ndocument.save('output.docx')\n\nprint(\"Saved 'output.docx' with basic HTML content.\")\n\n# Example 2: Convert an HTML string directly to a BytesIO object (in-memory)\nbuffer = BytesIO()\nparser_in_memory = HtmlToDocx()\nhtml_string_2 = '<p style=\"color: blue;\">This text is blue.</p>'\nparser_in_memory.add_html_to_document(html_string_2, buffer)\n\n# To read from the buffer again, reset its position\nbuffer.seek(0)\nprint(f\"Generated DOCX in memory, size: {len(buffer.getvalue())} bytes.\")\n\n# Example 3: Convert an HTML file directly\n# Create a dummy HTML file for demonstration\nwith open('input.html', 'w', encoding='utf-8') as f:\n    f.write('<h2>Content from file</h2><p>This was converted from an HTML file.</p>')\n\nfile_parser = HtmlToDocx()\nfile_parser.parse_html_file('input.html', 'output_from_file.docx')\n\nprint(\"Saved 'output_from_file.docx' from 'input.html'.\")","lang":"python","description":"This quickstart demonstrates the core functionalities of html-for-docx: adding HTML strings to a `python-docx` Document object, saving to a file, converting to an in-memory BytesIO object, and converting directly from an HTML file."},"warnings":[{"fix":"Simplify HTML/CSS where possible. Test complex layouts thoroughly. Consider using the `style_map` option for fine-grained control over how CSS classes map to Word styles.","message":"HTML to DOCX conversion inherently carries limitations, especially with complex CSS layouts, responsive designs, or intricate styling. The output DOCX might not perfectly match the browser's rendering of the HTML.","severity":"gotcha","affected_versions":"All versions"},{"fix":"Set the `table_style` attribute on the `HtmlToDocx` parser instance to apply a predefined Word table style, for example: `parser.table_style = 'Table Grid'`.","message":"By default, tables in the output DOCX will not have any specific styling (e.g., borders).","severity":"gotcha","affected_versions":"All versions"},{"fix":"Always load your template if it contains custom styles: `document = Document('your_template.docx')`. Ensure any referenced custom styles exist in the document at generation time; warnings will be logged for missing styles.","message":"If you are using `python-docx` templates with custom styles, these custom styles will not be present if you initialize `document = Document()` without loading the template. This can lead to missing styles when adding HTML content.","severity":"gotcha","affected_versions":"All versions"}],"env_vars":null,"last_verified":"2026-04-16T00:00:00.000Z","next_check":"2026-07-15T00:00:00.000Z","problems":[{"fix":"Ensure that the directory path for the output `.docx` file already exists, or provide a full absolute path. Python's `os.makedirs()` can be used to create directories if needed.","cause":"The specified output DOCX file path is invalid, or the directory where the file is supposed to be saved does not exist.","error":"FileNotFoundError: [Errno 2] No such file or directory: 'your_file_name.docx'"},{"fix":"Set the `table_style` attribute on your `HtmlToDocx` parser instance before processing, e.g., `parser = HtmlToDocx(table_style='Table Grid')` or `parser.table_style = 'Light Shading Accent 1'`. Refer to `python-docx` documentation or Word itself for available table style names.","cause":"The `html-for-docx` library does not apply default styles to tables.","error":"Tables are not showing borders or other expected styles in the output DOCX."},{"fix":"Consult the `html-for-docx` documentation for the list of supported HTML tags and CSS properties. For custom class-based styling, use the `style_map` option. For highest precedence, apply inline CSS with `!important`.","cause":"The library might not support all CSS properties, or there could be style precedence issues. Check the documentation for currently supported properties.","error":"Specific HTML tags or inline CSS styles (e.g., `color`, `font-size`) are not being applied, or render incorrectly in the DOCX output."},{"fix":"Upgrade `html-for-docx` to version 1.1.3 or higher, as this specific bug was fixed in that release.","cause":"Older versions (prior to 1.1.3) had a bug handling specific image formats, notably those with RGBA color profiles.","error":"Crash or incorrect rendering when processing images with RGBA color profiles."}]}