XLSX to HTML Converter
xlsx2html is a Python library designed for converting Excel (XLSX) files into HTML tables while striving to preserve cell formatting. It is actively maintained, with ongoing updates to support newer Python versions, address bugs, and introduce new features for more flexible conversions. The current stable version is 0.6.4.
Common errors
-
TypeError: Cell object is not iterable
cause This error often occurred in versions prior to 0.5.0, particularly when dealing with merged cells or complex cell structures in the XLSX input.fixUpgrade `xlsx2html` to version 0.5.0 or newer: `pip install --upgrade xlsx2html`. -
UnicodeEncodeError: 'charmap' codec can't encode character...
cause This issue was related to `xlsx2html`'s handling of non-unicode system locales, leading to encoding failures for certain characters.fixUpgrade `xlsx2html` to version 0.6.1 or newer: `pip install --upgrade xlsx2html`. This version includes a fix for `UnicodeEncodeError`. -
The output HTML only contains data from the first sheet of my Excel file.
cause Before version 0.6.0, the default behavior of `xlsx2html` was to process only the first sheet. Users had to implement custom logic to convert multiple sheets.fixIf using `xlsx2html` 0.6.0 or newer, specify `sheet=-1` to convert all sheets (e.g., `xlsx2html(..., sheet=-1)`), or provide a list of sheet indices/names (e.g., `sheet=[0, 1]`). If using an older version, consider upgrading or implementing custom multi-sheet iteration.
Warnings
- breaking Support for Python 3.6 was dropped in version 0.4.2. Users on Python 3.6 must upgrade their Python environment to 3.7 or newer to use `xlsx2html` versions 0.4.2 and beyond.
- gotcha Prior to version 0.6.0, `xlsx2html` would only process the first sheet of an Excel workbook by default. Multi-sheet conversion required specific manual iteration.
- gotcha Specific Excel formatting, such as complex borders on empty cells, merged cells, or certain number formats (e.g., currency symbols), may not be perfectly preserved in the generated HTML due to inherent differences between Excel rendering and HTML/CSS capabilities. This can also lead to issues where empty cells appear without borders.
- gotcha When using file-like objects (e.g., `io.BytesIO`, `io.StringIO`) for input, the XLSX file must be opened in binary mode (`'rb'`). Using text mode (`'r'`) will lead to errors.
Install
-
pip install xlsx2html
Imports
- xlsx2html
from xlsx2html import xlsx2html
Quickstart
import io
from xlsx2html import xlsx2html
# Create a dummy XLSX file for demonstration
# (In a real scenario, you would have an actual .xlsx file)
# For this example, let's assume 'example.xlsx' exists.
# You can create one manually or with openpyxl.
# Example: workbook = openpyxl.Workbook(); workbook.active['A1'] = 'Hello'; workbook.save('example.xlsx')
# Option 1: Convert an XLSX file path to an HTML file path
# xlsx2html('path/to/example.xlsx', 'path/to/output.html')
# Option 2: Convert an XLSX file-like object to an HTML string (in-memory)
with open('example.xlsx', 'rb') as xlsx_file:
output_stream = io.StringIO()
xlsx2html(xlsx_file, output_stream, locale='en')
output_stream.seek(0)
html_content = output_stream.read()
print(html_content[:500]) # Print first 500 chars of HTML for brevity